Momentum and Stochastic Momentum for Stochastic Gradient, Newton,   Proximal Point and Subspace Descent Methods

Nicolas Loizou; Peter Richt\'arik

arXiv:1712.09677·math.OC·March 30, 2018

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

Nicolas Loizou, Peter Richt\'arik

PDF

TL;DR

This paper introduces and analyzes stochastic momentum variants for several stochastic optimization algorithms, proving their convergence rates and demonstrating potential efficiency improvements over deterministic momentum in certain regimes.

Contribution

It is the first study of momentum variants for multiple stochastic methods, establishing their convergence properties and proposing stochastic momentum to reduce computational costs.

Findings

01

Proved global linear convergence rates for stochastic momentum methods.

02

Showed accelerated convergence of primal iterates in L1 sense.

03

Demonstrated improved complexity of stochastic momentum methods in sparse data regimes.

Abstract

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.