Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren

TL;DR
This paper proves that stochastic gradient descent (SGD) with large stepsizes converges rapidly in the last iterate for smooth convex functions in the interpolation regime, with near-optimal rates that improve upon previous results.
Contribution
It establishes new convergence guarantees for the last iterate of SGD with large stepsizes in the interpolation regime, extending prior work beyond least squares regression.
Findings
Expected excess risk of SGD last iterate is near optimal, combining $1/T$ and $rac{ ext{noise}}{ ext{sqrt}(T)}$ rates.
For zero noise, SGD last iterate achieves an $O(1/ oot T)$ convergence rate.
Improves upon previous rates in realizable linear regression, especially with large stepsizes.
Abstract
We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after steps of SGD on -smooth convex loss functions with stepsize , the last iterate exhibits expected excess risk , where denotes the variance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Methods in Computational Mathematics
MethodsStochastic Gradient Descent
