Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon, Wilson

TL;DR
This paper introduces Cyclical Stochastic Gradient MCMC, a novel method with a cyclical stepsize schedule that effectively explores high-dimensional, multimodal neural network weight distributions for Bayesian deep learning.
Contribution
It proposes a new cyclical stepsize schedule for SG-MCMC, with proven convergence and demonstrated scalability on large datasets like ImageNet.
Findings
Effective exploration of multimodal distributions
Proven non-asymptotic convergence guarantees
Scalable performance on complex neural networks
Abstract
The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We also prove non-asymptotic convergence of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the scalability and effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Markov Chains and Monte Carlo Methods · Domain Adaptation and Few-Shot Learning
