Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems

Guillaume Garrigos; Daniel Cortild; Lucas Ketels; Juan Peypouquet

arXiv:2507.14122·math.OC·July 21, 2025

Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems

Guillaume Garrigos, Daniel Cortild, Lucas Ketels, Juan Peypouquet

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent (SGD) achieves an optimal last-iterate convergence rate of approximately T^{-1/2} for convex smooth stochastic problems without requiring the assumption of bounded gradient variance.

Contribution

It establishes the first last-iterate convergence rate for SGD in convex smooth stochastic problems without assuming bounded gradient variance.

Findings

01

SGD attains a O(T^{-1/2}) last-iterate convergence rate.

02

No need for unverifiable bounded variance assumption.

03

Results improve understanding of SGD's behavior in convex optimization.

Abstract

Most results on Stochastic Gradient Descent (SGD) in the convex and smooth setting are presented under the form of bounds on the ergodic function value gap. It is an open question whether bounds can be derived directly on the last iterate of SGD in this context. Recent advances suggest that it should be possible. For instance, it can be achieved by making the additional, yet unverifiable, assumption that the variance of the stochastic gradients is uniformly bounded. In this paper, we show that there is no need of such an assumption, and that SGD enjoys a $\tilde{O} (T^{- 1/2})$ last-iterate complexity rate for convex smooth stochastic problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods