Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping
Russell Z. Kunes, Mingzhang Yin, Max Land, Doron Haviv, Dana Pe'er,, Simon Tavar\'e

TL;DR
This paper introduces a new gradient estimator, bitflip-1, with lower variance at boundary regions, and an aggregated estimator UGC that combines it with DisARM to improve gradient estimation for discrete latent variable models.
Contribution
The paper proposes bitflip-1, a novel gradient estimator with reduced boundary variance, and UGC, an aggregated method combining bitflip-1 and DisARM for better gradient variance reduction.
Findings
UGC has lower variance than DisARM across experiments.
UGC achieves optimal solutions in toy, VAE, and subset selection tasks.
bitflip-1 reduces variance at parameter space boundaries.
Abstract
Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsTuckER
