Rao-Blackwellized Stochastic Gradients for Discrete Distributions
Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael I. Jordan,, and Jon McAuliffe

TL;DR
This paper introduces a Rao-Blackwellization technique to reduce the variance of stochastic gradient estimators for expectations over large or infinite discrete sample spaces, maintaining unbiasedness.
Contribution
It presents a novel variance reduction method applicable to any unbiased stochastic gradient estimator for discrete distributions, leveraging Rao-Blackwellization.
Findings
Significant variance reduction demonstrated on semi-supervised classification.
Improved performance on pixel attention task.
Technique retains unbiasedness of estimators.
Abstract
We wish to compute the gradient of an expectation over a finite or countably infinite sample space having categories. When is indeed infinite, or finite but very large, the relevant summation is intractable. Accordingly, various stochastic gradient estimators have been proposed. In this paper, we describe a technique that can be applied to reduce the variance of any such estimator, without changing its bias---in particular, unbiasedness is retained. We show that our technique is an instance of Rao-Blackwellization, and we demonstrate the improvement it yields on a semi-supervised classification problem and a pixel attention task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
