Biadversarial Variational Autoencoder
Arnaud Fickinger

TL;DR
This paper introduces a Biadversarial Variational Autoencoder that replaces Gaussian assumptions with adversarial networks, enabling better modeling of multimodal distributions and improving image quality.
Contribution
It proposes a novel VAE framework using adversarial networks to avoid Gaussian assumptions, enhancing the ability to model complex, multimodal data distributions.
Findings
Avoids Gaussian assumptions in VAE
Improves modeling of multimodal distributions
Produces sharper, higher-quality images
Abstract
In the original version of the Variational Autoencoder, Kingma et al. assume Gaussian distributions for the approximate posterior during the inference and for the output during the generative process. This assumptions are good for computational reasons, e.g. we can easily optimize the parameters of a neural network using the reparametrization trick and the KL divergence between two Gaussians can be computed in closed form. However it results in blurry images due to its difficulty to represent multimodal distributions. We show that using two adversarial networks, we can optimize the parameters without any Gaussian assumptions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
Biadversarial Variational Autoencoder
Arnaud Fickinger
Department of Computer Science
Ecole Polytechnique
Palaiseau, FRANCE
Abstract
In the original version of the variational autoencoder 2013arXiv1312.6114K , Kingma et al. assume Gaussian distributions for the approximate posterior during the inference and for the output during the generative process. This assumptions are good for computational reasons, e.g. we can easily optimize the parameters of a neural network using the reparametrization trick and the KL divergence between two Gaussians can be computed in closed form. However it results in blurry images due to its difficulty to represent multimodal distributions. We show that using two adversarial networks, we can optimize the parameters without any Gaussian assumptions.
1 Introduction
We want to maximize the evidence lower bound (ELBO) of the marginal likelihood . We can derive the ELBO with the Jensen inequality by marginalizing out the latent variable and introducing the approximate posterior :
[TABLE]
2 Inference
Many works on variational autoencoder assume a Gaussian distribution for the approximate posterior distribution:
[TABLE]
where and are neural network functions.
This is convenient for computation but very restrictive for . We introduce an adversarial network that will optimize the parameters of the encoder without the need of any restrictive assumption. To do that, rearrange eq. (1):
[TABLE]
The objective being:
[TABLE]
where denotes the parameter of the encoder and denotes the parameters of the decoder.
Rewrite the second term of the ELBO in eq. (3) to bring out a Kullback-Leibler (KL) divergence:
[TABLE]
This term corresponds to the KL divergence between the approximate posterior and the prior . Note that it is the reverse KL divergence, ie. the difference between both distributions is bounded by the approximate posterior, which is a better option to learn real modes in case of a multimodal distribution. Inspired by 2016arXiv160600709N , we define a network with an objective that differs slightly from the original adversarial network 2014arXiv1406.2661G so the associated generator learns to minimize the reverse KL divergence instead of the Jensen-Shannon divergence. In so doing we are able to optimize the parameters without doing any parametric assumption on the posterior. Introduce the network with the following objective:
[TABLE]
where the parameters is fixed.
Inspired by 2014arXiv1406.2661G , write the second term as an integral to find the optimal value of :
[TABLE]
Given a pair in , the function reaches its maximum at . Hence the maximum of the integral is reached if:
[TABLE]
By replacing eq. (8) in eq. (6), we show that the optimal value function reached by the discriminator, the generator being fixed, is the KL divergence in eq. (5):
[TABLE]
In so doing we can optimize the second term of the ELBO with a minimax game with value function :
[TABLE]
3 Generative process
Many works on variational autoencoder assume also a Gaussian distribution for the output distribution:
[TABLE]
where is a neural network function and is a hyperparameter.
The negative log likelihood of this distribution is an affine function of the L2 norm, hence we often encounter a L2 reconstruction term in works on variational autoencoders :
[TABLE]
Rearrange the first term of the objective in eq. (3) to bring out a direct KL divergence:
[TABLE]
This time we choose an adversarial objective so that the associated generator learns to minimize the direct KL divergence. Introduce the network with the following objective:
[TABLE]
where the parameters and are fixed.
Write the second term as an integral to find the optimal value of :
[TABLE]
Given a pair in , the function reaches its maximum at . Hence the maximum of the integral is reached if:
[TABLE]
By replacing eq. (16) in eq. (14), we show that the optimal value function reached by the discriminator, the generator being fixed, is the direct KL divergence in eq. (13):
[TABLE]
In so doing we can optimize the second term of the ELBO with a minimax game with value function :
[TABLE]
Finally we have transformed the optimization of the ELBO into a minimax game involving two discriminators:
[TABLE]
4 Implementation
The model is implemented with PyTorch. The implementation is available here:
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. ar Xiv e-prints , page ar Xiv:1312.6114, December 2013.
- 2(2) Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. ar Xiv e-prints , page ar Xiv:1606.00709, June 2016.
- 3(3) Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. ar Xiv e-prints , page ar Xiv:1406.2661, June 2014.
