Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds
Valentin Li\'evin, Andrea Dittadi, Anders Christensen, Ole Winther

TL;DR
This paper presents a new control variate for the importance weighted variational bound that significantly improves the signal-to-noise ratio of the score function gradient estimator, especially at large sample sizes, without using reparameterization.
Contribution
It introduces a novel control variate that enhances the SNR growth of the IWAE gradient estimator, extending VIMCO and outperforming existing methods.
Findings
SNR of the estimator grows as √K with the new control variate.
The method achieves superior variance reduction in training generative models.
It is competitive with state-of-the-art reparameterization-free gradient estimators.
Abstract
This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as . This is in contrast to the standard pathwise gradient estimator where the SNR decreases as . Based on our theoretical findings we develop a novel control variate that extends on VIMCO. Empirically, for the training of both continuous and discrete generative models, the proposed method yields superior variance reduction, resulting in an SNR for IWAE that increases with without relying on the reparameterization trick. The novel estimator is competitive with state-of-the-art reparameterization-free gradient estimators such as Reweighted Wake-Sleep (RWS) and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
