Zero-Variance Gradients for Variational Autoencoders

Zilei Shao; Anji Liu; Guy Van den Broeck

arXiv:2508.03587·cs.LG·February 27, 2026

Zero-Variance Gradients for Variational Autoencoders

Zilei Shao, Anji Liu, Guy Van den Broeck

PDF

TL;DR

This paper introduces a method for training Variational Autoencoders using zero-variance gradients by restricting decoder architecture to allow analytical computation of the ELBO, leading to more stable and efficient training.

Contribution

It proposes a novel approach called Silent Gradients that enables zero-variance gradient estimation in VAEs through architectural restrictions, improving training stability and performance.

Findings

01

Analytic gradients outperform standard estimators in linear decoders.

02

The approach improves training stability across multiple datasets.

03

It enhances existing methods like reparameterization and Gumbel-Softmax.

Abstract

Training deep generative models like Variational Autoencoders (VAEs) requires propagating gradients through stochastic latent variables, which introduces estimation variance that can slow convergence and degrade performance. In this paper, we explore an orthogonal direction, which we call Silent Gradients. Instead of designing improved stochastic estimators, we show that by restricting the decoder architecture in specific ways, the expected ELBO can be computed analytically. This yields gradients with zero estimation variance as we can directly compute the evidence lower-bound without resorting to Monte Carlo samples of the latent variables. We first provide a theoretical analysis in a controlled setting with a linear decoder and demonstrate improved optimization compared to standard estimators. To extend this idea to expressive nonlinear decoders, we introduce a training paradigm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.