Non-convex Learning via Replica Exchange Stochastic Gradient MCMC
Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin

TL;DR
This paper introduces an adaptive replica exchange stochastic gradient MCMC method that corrects biases in mini-batch settings, enabling scalable and accelerated sampling for deep neural network training, achieving state-of-the-art results.
Contribution
It proposes an adaptive reSGMCMC algorithm that corrects biases in mini-batch replica exchange MCMC, improving scalability and convergence in deep learning applications.
Findings
Achieves state-of-the-art results on CIFAR10, CIFAR100, and SVHN datasets.
Demonstrates an acceleration-accuracy trade-off in the numerical discretization process.
Validates effectiveness through extensive experiments in supervised and semi-supervised learning.
Abstract
Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The na\"ive implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard sampling method for simulating from deep neural networks (DNNs). In this paper, we propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. The analysis implies an acceleration-accuracy trade-off in the numerical discretization of a Markov jump process in a stochastic environment. Empirically, we test the algorithm through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
MethodsReplica exchange stochastic gradient Langevin Dynamics
