Rao-Blackwellized Stochastic Gradients for Discrete Distributions

Runjing Liu; Jeffrey Regier; Nilesh Tripuraneni; Michael I. Jordan,; and Jon McAuliffe

arXiv:1810.04777·stat.ML·May 14, 2019·5 cites

Rao-Blackwellized Stochastic Gradients for Discrete Distributions

Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael I. Jordan,, and Jon McAuliffe

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Rao-Blackwellization technique to reduce the variance of stochastic gradient estimators for expectations over large or infinite discrete sample spaces, maintaining unbiasedness.

Contribution

It presents a novel variance reduction method applicable to any unbiased stochastic gradient estimator for discrete distributions, leveraging Rao-Blackwellization.

Findings

01

Significant variance reduction demonstrated on semi-supervised classification.

02

Improved performance on pixel attention task.

03

Technique retains unbiasedness of estimators.

Abstract

We wish to compute the gradient of an expectation over a finite or countably infinite sample space having $K \leq \infty$ categories. When $K$ is indeed infinite, or finite but very large, the relevant summation is intractable. Accordingly, various stochastic gradient estimators have been proposed. In this paper, we describe a technique that can be applied to reduce the variance of any such estimator, without changing its bias---in particular, unbiasedness is retained. We show that our technique is an instance of Rao-Blackwellization, and we demonstrate the improvement it yields on a semi-supervised classification problem and a pixel attention task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Runjing-Liu120/RaoBlackwellizedSGD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms