Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

TL;DR
This paper explores the connection between stochastic gradient MCMC methods and stochastic optimization by applying simulated annealing, introducing adaptive components, and demonstrating state-of-the-art results on neural networks.
Contribution
It introduces a novel approach linking SG-MCMC and stochastic optimization through simulated annealing and adaptive momentum, providing theoretical convergence insights.
Findings
Converges close to global optima under certain conditions
Achieves state-of-the-art results on deep neural networks
Extends SG-MCMC with adaptive preconditioners and momentum
Abstract
Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SGMCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsAdaGrad
