My first implementation of VAE in Tensorflow, Python
About me
Hi, my name is Michele and i'm doing a Master degree in Computer Science (Data Science specialization).
I've very interested about Machine Learning, Deep Learning, Data Science and study also in free time papers, articles and discuss with other about this topics
The problem I wanted to solve
I was very interested about Variational AutoEncoder because combines explicitly elements from Machine Learning and Statistic, in particular taking from Variational Inference which is based on Bayesian Inference and use Variational calculus. Since i want to create a generative model with VAE, i've taken an interesting data type for this application, images
What is My first implementation of VAE in Tensorflow, Python?
I've trained a VAE and used it as generative model in order to create new images sampling from a prior bivariate distribution z~N(0, I)
Tech stack
I've chosen Tensorflow because i like the high control that gives you in the building process of neural networks, therefore the language was Python
The process of building
I've implemented the VAE inside a Python class in order to have more modularity, but after finishing it i thing it's better divide the VAE in three classes: Encoder, LatentDistribution, Decoder.
This representation allow more decoupling between classes which improve code readability and maintenance.
A VAE class can be created using composition over the three last classes
Challenges I faced
I've implemented immediatly all the model. It was almost working, except i spent some days understanding why the model produces constant output images while the loss (ELBO) was very low, around 0.41. After checking other online implementations of VAE, i've noticed that i've kept the default params on the reduction mode both in crossentropy and in Kullback Leibler terms of loss function, and this causes wrong calculus of the loss function. In my opinion is better avoid these parameters and use explicitly "tf.reduce_sum or tf.reduce_mean"
Key learnings
I've learned both theory and practice behind Variational AutoEncoder, it is very interesting, from how to reach ELBO using Variational Inference to practical tricks, such as when implementing the layers of the parameters of the distribution, instead of estimating the variance when using Normal distribution it's better estimating log(variance) instead of variance because output of nn can be both negative or positive (while variance only positive) (in fact e^(log(var)) = var > 0) and also because the term is recycled in the loss function ELBO
Implementation link
The implementation of my Variational AutoEncoder is here.
The image dataset used are "Fashion MNIST" and "Handwritter a-z" in order to focus only on VAE and not in preprocessing details.
Tips and advice
A resume of the tips describes on the latter points:
- Decouple the VAE implementation in three classes: Encoder, LatentDistribution, Decoder, then create a class VAE composed by the latter three.
- Avoid reduction arguments when calculate the loss function and use explicitly "tf.reduce_sum or tf.reduce_mean"
- When parametrize the latent distribution as normal, estimate in hidden layer log(variance) instead of variance in order to handle negativity of output and recycle the element in loss function
Final thoughts and next steps
I enjoyed getting synthetic images from a VAE but this is only the basis of Neural Network as generative model. In fact, there are many evolution of VAE such as beta-VAE, VAE-GAN and so on. Also i noticed that a recent trend of combining Encoder/Decoder architectures with Attention mechanism, so i must study this last one
My current implementation of VAE suffer of some problems, such as blurry images sometimes (but reading on net it's a common problem of Gaussian distribution applied to VAE), sometimes in input reconstruction confuses a letter with another (but they are similar in form)
Other times, when VAE does input recostruction, the output is better than original input. You can check this results in my kernel on Kaggle