Stochastic optimization with momentum: convergence, fluctuations, and   traps avoidance

A. Barakat; P. Bianchi; W. Hachem; and Sh. Schechtman

arXiv:2012.04002·math.OC·July 13, 2021

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

A. Barakat, P. Bianchi, W. Hachem, and Sh. Schechtman

PDF

TL;DR

This paper analyzes a unified stochastic optimization framework, including popular methods like Adam and S-NAG, proving convergence, stability, and trap avoidance in non-convex settings through differential equation analysis.

Contribution

It introduces a comprehensive analysis of stochastic optimization algorithms as noisy discretizations of differential equations, establishing convergence and trap avoidance results.

Findings

01

Proves almost sure convergence to critical points in non-convex settings.

02

Establishes convergence rates via a Central Limit Theorem.

03

Shows non-convergence to saddle points and maxima under certain conditions.

Abstract

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsNesterov Accelerated Gradient · Adam