On the Influence of Momentum Acceleration on Online Learning

Kun Yuan; Bicheng Ying; and Ali H. Sayed

arXiv:1603.04136·math.OC·October 13, 2016

On the Influence of Momentum Acceleration on Online Learning

Kun Yuan, Bicheng Ying, and Ali H. Sayed

PDF

TL;DR

This paper analyzes how momentum acceleration affects online stochastic gradient learning, revealing that momentum methods are equivalent to scaled standard methods, with implications for their effectiveness in adaptive online scenarios.

Contribution

It establishes the equivalence between momentum and standard stochastic gradient methods with a re-scaled step-size for all time, extending understanding beyond quadratic risks.

Findings

01

Momentum methods are equivalent to standard methods with larger step-size.

02

Momentum benefits do not necessarily translate to online adaptive settings.

03

Equivalence observed even in non-differentiable and non-convex problems.

Abstract

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSpatio-temporal stability analysis