On the Influence of Momentum Acceleration on Online Learning
Kun Yuan, Bicheng Ying, and Ali H. Sayed

TL;DR
This paper analyzes how momentum acceleration affects online stochastic gradient learning, revealing that momentum methods are equivalent to scaled standard methods, with implications for their effectiveness in adaptive online scenarios.
Contribution
It establishes the equivalence between momentum and standard stochastic gradient methods with a re-scaled step-size for all time, extending understanding beyond quadratic risks.
Findings
Momentum methods are equivalent to standard methods with larger step-size.
Momentum benefits do not necessarily translate to online adaptive settings.
Equivalence observed even in non-differentiable and non-convex problems.
Abstract
The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpatio-temporal stability analysis
