On the Last Iterate Convergence of Momentum Methods
Xiaoyu Li, Mingrui Liu, Francesco Orabona

TL;DR
This paper investigates the convergence behavior of the last iterate in Momentum methods, revealing limitations of standard SGDM and proposing improved algorithms with optimal convergence rates for convex stochastic optimization.
Contribution
It proves suboptimal convergence of last iterate in standard SGDM and introduces FTRL-based SGDM algorithms with increasing momentum achieving optimal rates.
Findings
Standard SGDM last iterate has suboptimal $rac{ ext{ln} T}{ ext{sqrt} T}$ convergence.
FTRL-based SGDM with increasing momentum achieves $O(rac{1}{ ext{sqrt} T})$ convergence.
Empirical results support theoretical findings.
Abstract
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of after iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques
