Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Amit Attia; Matan Schliserman; Uri Sherman; Tomer Koren

arXiv:2507.11274·cs.LG·July 29, 2025

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent (SGD) with large stepsizes converges rapidly in the last iterate for smooth convex functions in the interpolation regime, with near-optimal rates that improve upon previous results.

Contribution

It establishes new convergence guarantees for the last iterate of SGD with large stepsizes in the interpolation regime, extending prior work beyond least squares regression.

Findings

01

Expected excess risk of SGD last iterate is near optimal, combining $1/T$ and $rac{ ext{noise}}{ ext{sqrt}(T)}$ rates.

02

For zero noise, SGD last iterate achieves an $O(1/ oot T)$ convergence rate.

03

Improves upon previous rates in realizable linear regression, especially with large stepsizes.

Abstract

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $β$ -smooth convex loss functions with stepsize $0 < η < 2/ β$ , the last iterate exhibits expected excess risk $O (\frac{1}{η ( 2 - β η ) T ^{1 - β η /2}} + \frac{η}{( 2 - β η ) ^{2}} T^{β η /2} σ_{⋆}^{2})$ , where $σ_{⋆}^{2}$ denotes the variance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Methods in Computational Mathematics

MethodsStochastic Gradient Descent