Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization
Margarita Preobrazhenskaia, Makar Sidorov, Igor Preobrazhenskii, Eduard Gorbunov

TL;DR
This paper analyzes the convergence rate of the last iterate of AdaGrad in convex non-smooth optimization, establishing a tight $O(1/N^{1/4})$ rate and comparing it to averaged iterate rates.
Contribution
It provides the first worst-case convergence bounds for AdaGrad's last iterate, showing the rate is strictly worse than the averaged iterate rate and is tight.
Findings
Last iterate converges at rate $O(1/N^{1/4})$ with optimal stepsize.
Matching lower bounds prove the rate is tight.
Last-iterate rate is worse than the classical $O(1/N^{1/2})$ for averaged iterates.
Abstract
We study the convergence of the last iterate (i.e., the -th iterate) of the AdaGrad method. Although AdaGrad -- an adaptive subgradient method -- underpins a wide class of algorithms, most existing convergence analyses focus on averaged (or best) iterates. We derive worst-case upper bounds on the suboptimality of the final point and show that, with an optimally tuned stepsize parameter, the last iterate converges at the rate . We complement this guarantee with matching lower-bound constructions, proving that this rate is tight and that AdaGrad's last-iterate rate is strictly worse than the classical rate for its averaged iterate. Technically, our analysis introduces an exponent parameter that captures the growth of the cumulative squared subgradients; combined with the last-iterate inequality of Zamani and Glineur (2025), this reduces the problem to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
