Stochastic Gradient MCMC with Stale Gradients

Changyou Chen; Nan Ding; Chunyuan Li; Yizhe Zhang and; Lawrence Carin

arXiv:1610.06664·stat.ML·October 24, 2016·1 cites

Stochastic Gradient MCMC with Stale Gradients

Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang and, Lawrence Carin

PDF

Open Access

TL;DR

This paper analyzes the impact of stale gradients on stochastic gradient MCMC algorithms, showing that while bias and MSE are affected, the estimation variance remains unaffected, enabling scalable distributed Bayesian inference.

Contribution

The paper provides a theoretical analysis of SG-MCMC with stale gradients, revealing their effects on bias, MSE, and variance, and demonstrates linear speedup in distributed settings.

Findings

01

Bias and MSE depend on gradient staleness

02

Estimation variance is independent of staleness

03

Linear speedup in variance reduction with more workers

Abstract

Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is becoming increasingly popular to employ distributed systems, where stochastic gradients are computed based on some outdated parameters, yielding what are termed stale gradients. While stale gradients could be directly used in SG-MCMC, their impact on convergence properties has not been well studied. In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it. In a simple Bayesian distributed system with SG-MCMC, where stale gradients are computed asynchronously by a set of workers, our theory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference