Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Wei Deng; Qi Feng; Liyao Gao; Faming Liang; Guang Lin

arXiv:2008.05367·stat.ML·March 23, 2021·20 cites

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces an adaptive replica exchange stochastic gradient MCMC method that corrects biases in mini-batch settings, enabling scalable and accelerated sampling for deep neural network training, achieving state-of-the-art results.

Contribution

It proposes an adaptive reSGMCMC algorithm that corrects biases in mini-batch replica exchange MCMC, improving scalability and convergence in deep learning applications.

Findings

01

Achieves state-of-the-art results on CIFAR10, CIFAR100, and SVHN datasets.

02

Demonstrates an acceleration-accuracy trade-off in the numerical discretization process.

03

Validates effectiveness through extensive experiments in supervised and semi-supervised learning.

Abstract

Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The na\"ive implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard sampling method for simulating from deep neural networks (DNNs). In this paper, we propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. The analysis implies an acceleration-accuracy trade-off in the numerical discretization of a Markov jump process in a stochastic environment. Empirically, we test the algorithm through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications

MethodsReplica exchange stochastic gradient Langevin Dynamics