On the Last Iterate Convergence of Momentum Methods

Xiaoyu Li; Mingrui Liu; Francesco Orabona

arXiv:2102.07002·cs.LG·July 26, 2022·1 cites

On the Last Iterate Convergence of Momentum Methods

Xiaoyu Li, Mingrui Liu, Francesco Orabona

PDF

Open Access

TL;DR

This paper investigates the convergence behavior of the last iterate in Momentum methods, revealing limitations of standard SGDM and proposing improved algorithms with optimal convergence rates for convex stochastic optimization.

Contribution

It proves suboptimal convergence of last iterate in standard SGDM and introduces FTRL-based SGDM algorithms with increasing momentum achieving optimal rates.

Findings

01

Standard SGDM last iterate has suboptimal $rac{ ext{ln} T}{ ext{sqrt} T}$ convergence.

02

FTRL-based SGDM with increasing momentum achieves $O(rac{1}{ ext{sqrt} T})$ convergence.

03

Empirical results support theoretical findings.

Abstract

SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of $Ω (\frac{l n T}{T})$ after $T$ iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques