Improved Gradient-Based Optimization Over Discrete Distributions
Evgeny Andriyash, Arash Vahdat, Bill Macready

TL;DR
This paper analyzes and improves gradient estimation methods for discrete distributions, proposing bias-reduction techniques that enhance performance in variational inference and binary optimization tasks.
Contribution
It introduces a simple bias-reduction method for the Gumbel-Softmax estimator and a new piece-wise linear relaxation, improving gradient estimates for discrete variables.
Findings
Reduced bias improves variational inference performance
New relaxation outperforms existing methods in binary optimization
Proposed methods have lower bias and comparable variance
Abstract
In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
