Bridging the Gap between Stochastic Gradient MCMC and Stochastic   Optimization

Changyou Chen; David Carlson; Zhe Gan; Chunyuan Li; Lawrence Carin

arXiv:1512.07962·stat.ML·August 8, 2016·49 cites

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

PDF

Open Access 1 Repo

TL;DR

This paper explores the connection between stochastic gradient MCMC methods and stochastic optimization by applying simulated annealing, introducing adaptive components, and demonstrating state-of-the-art results on neural networks.

Contribution

It introduces a novel approach linking SG-MCMC and stochastic optimization through simulated annealing and adaptive momentum, providing theoretical convergence insights.

Findings

01

Converges close to global optima under certain conditions

02

Achieves state-of-the-art results on deep neural networks

03

Extends SG-MCMC with adaptive preconditioners and momentum

Abstract

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SGMCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cchangyou/Santa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsAdaGrad