Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators
Alexander Shekhovtsov

TL;DR
This paper analyzes the bias and variance tradeoffs of popular binary gradient estimators like straight-through and Gumbel-Softmax, revealing their limitations and guiding better choices in models with binary variables.
Contribution
It provides a theoretical analysis of bias and variance in binary gradient estimators, clarifying their tradeoffs and exposing potential issues.
Findings
Straight-through estimator is simple but can have significant bias.
Gumbel-Softmax and variants exhibit tradeoffs between bias and variance.
Theoretical results reveal limitations of existing estimators in certain scenarios.
Abstract
Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while keeping the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of bias and variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results allow for better understanding of these methods and in some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
