Improving Conditional VAE with Non-Volume Preserving transformations

Tuhin Subhra De

arXiv:2511.08946·cs.LG·March 10, 2026

Improving Conditional VAE with Non-Volume Preserving transformations

Tuhin Subhra De

PDF

Open Access

TL;DR

This paper enhances conditional Variational Autoencoders by using Non-Volume Preserving transformations to better model the latent space, leading to improved image quality and likelihood metrics over previous methods.

Contribution

It introduces the use of NVP transformations to estimate the conditional distribution in CVAEs, addressing a key assumption in prior work and improving generative performance.

Findings

01

Reduced FID by 4% indicating better image quality

02

Increased log likelihood by 7.6%, showing improved model fit

03

Outperforms existing CVAE methods on benchmark metrics

Abstract

Variational Autoencoders and Generative Adversarial Networks remained the state-of-the-art (SOTA) generative models until 2022. Now they are superseded by diffusion-based models. Efforts to improve traditional models have stagnated as a result. In old-school fashion, we explore image generation with conditional Variational Autoencoders (CVAE) to incorporate desired attributes within the images. VAEs are known to produce blurry images with less diversity; we refer to a method that solves this issue by leveraging the variance of the gaussian decoder as a learnable parameter during training. Previous works on CVAEs assumed that the conditional distribution of the latent space given the labels is equal to the prior distribution, which is not the case in reality. We show that estimating it using Non-Volume Preserving (NVP) transformations results in better image generation than existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning