Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
Nicolas Loizou, Peter Richt\'arik

TL;DR
This paper introduces and analyzes stochastic momentum variants for several stochastic optimization algorithms, proving their convergence rates and demonstrating potential efficiency improvements over deterministic momentum in certain regimes.
Contribution
It is the first study of momentum variants for multiple stochastic methods, establishing their convergence properties and proposing stochastic momentum to reduce computational costs.
Findings
Proved global linear convergence rates for stochastic momentum methods.
Showed accelerated convergence of primal iterates in L1 sense.
Demonstrated improved complexity of stochastic momentum methods in sparse data regimes.
Abstract
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
