Improving Conditional VAE with Non-Volume Preserving transformations
Tuhin Subhra De

TL;DR
This paper enhances conditional Variational Autoencoders by using Non-Volume Preserving transformations to better model the latent space, leading to improved image quality and likelihood metrics over previous methods.
Contribution
It introduces the use of NVP transformations to estimate the conditional distribution in CVAEs, addressing a key assumption in prior work and improving generative performance.
Findings
Reduced FID by 4% indicating better image quality
Increased log likelihood by 7.6%, showing improved model fit
Outperforms existing CVAE methods on benchmark metrics
Abstract
Variational Autoencoders and Generative Adversarial Networks remained the state-of-the-art (SOTA) generative models until 2022. Now they are superseded by diffusion-based models. Efforts to improve traditional models have stagnated as a result. In old-school fashion, we explore image generation with conditional Variational Autoencoders (CVAE) to incorporate desired attributes within the images. VAEs are known to produce blurry images with less diversity; we refer to a method that solves this issue by leveraging the variance of the gaussian decoder as a learnable parameter during training. Previous works on CVAEs assumed that the conditional distribution of the latent space given the labels is equal to the prior distribution, which is not the case in reality. We show that estimating it using Non-Volume Preserving (NVP) transformations results in better image generation than existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
