Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
A. Barakat, P. Bianchi, W. Hachem, and Sh. Schechtman

TL;DR
This paper analyzes a unified stochastic optimization framework, including popular methods like Adam and S-NAG, proving convergence, stability, and trap avoidance in non-convex settings through differential equation analysis.
Contribution
It introduces a comprehensive analysis of stochastic optimization algorithms as noisy discretizations of differential equations, establishing convergence and trap avoidance results.
Findings
Proves almost sure convergence to critical points in non-convex settings.
Establishes convergence rates via a Central Limit Theorem.
Shows non-convergence to saddle points and maxima under certain conditions.
Abstract
In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNesterov Accelerated Gradient · Adam
