An Improved Analysis of Stochastic Gradient Descent with Momentum

Yanli Liu; Yuan Gao; Wotao Yin

arXiv:2007.07989·math.OC·August 19, 2020·NeurIPS·30 cites

An Improved Analysis of Stochastic Gradient Descent with Momentum

Yanli Liu, Yuan Gao, Wotao Yin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a comprehensive analysis of stochastic gradient descent with momentum, demonstrating its convergence properties under various conditions and highlighting the benefits of multistage parameter strategies.

Contribution

It offers the first convergence guarantees for multistage SGDM and clarifies the role of momentum and dynamic parameters in its performance.

Findings

01

SGDM converges as fast as SGD for smooth objectives.

02

Multistage parameter strategies improve SGDM performance.

03

Numerical experiments support theoretical results.

Abstract

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters has not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also establish \textit{the first} convergence guarantee for the multistage setting, and show that the multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gao-yuan-hangzhou/improved-analysis-sgdm
pytorch

Videos

An Improved Analysis of Stochastic Gradient Descent with Momentum· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research