Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Ruqi Zhang; Chunyuan Li; Jianyi Zhang; Changyou Chen; Andrew Gordon; Wilson

arXiv:1902.03932·cs.LG·May 13, 2020·75 cites

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon, Wilson

PDF

Open Access 3 Repos

TL;DR

This paper introduces Cyclical Stochastic Gradient MCMC, a novel method with a cyclical stepsize schedule that effectively explores high-dimensional, multimodal neural network weight distributions for Bayesian deep learning.

Contribution

It proposes a new cyclical stepsize schedule for SG-MCMC, with proven convergence and demonstrated scalability on large datasets like ImageNet.

Findings

01

Effective exploration of multimodal distributions

02

Proven non-asymptotic convergence guarantees

03

Scalable performance on complex neural networks

Abstract

The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We also prove non-asymptotic convergence of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the scalability and effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Markov Chains and Monte Carlo Methods · Domain Adaptation and Few-Shot Learning