CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics   with Simulated Annealing

Oleksandr Borysenko; Maksym Byshkin

arXiv:2005.14605·stat.ML·May 24, 2021

CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing

Oleksandr Borysenko, Maksym Byshkin

PDF

1 Repo

TL;DR

CoolMomentum is a novel stochastic optimization method inspired by Langevin dynamics and simulated annealing, which gradually decreases momentum to improve training of deep neural networks.

Contribution

The paper introduces CoolMomentum, a new optimization algorithm that integrates Langevin dynamics with momentum decay, inspired by physical annealing processes.

Findings

01

Achieves high accuracy on Resnet-20 with CIFAR-10

02

Effective on Efficientnet-B0 with ImageNet

03

Provides a physics-inspired perspective on stochastic optimization

Abstract

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum -- a new stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

borbysh/coolmomentum
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.