Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

Margarita Preobrazhenskaia; Makar Sidorov; Igor Preobrazhenskii; Eduard Gorbunov

arXiv:2604.10728·math.OC·April 14, 2026

Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

Margarita Preobrazhenskaia, Makar Sidorov, Igor Preobrazhenskii, Eduard Gorbunov

PDF

TL;DR

This paper analyzes the convergence rate of the last iterate of AdaGrad in convex non-smooth optimization, establishing a tight $O(1/N^{1/4})$ rate and comparing it to averaged iterate rates.

Contribution

It provides the first worst-case convergence bounds for AdaGrad's last iterate, showing the rate is strictly worse than the averaged iterate rate and is tight.

Findings

01

Last iterate converges at rate $O(1/N^{1/4})$ with optimal stepsize.

02

Matching lower bounds prove the rate is tight.

03

Last-iterate rate is worse than the classical $O(1/N^{1/2})$ for averaged iterates.

Abstract

We study the convergence of the last iterate (i.e., the $(N + 1)$ -th iterate) of the AdaGrad method. Although AdaGrad -- an adaptive subgradient method -- underpins a wide class of algorithms, most existing convergence analyses focus on averaged (or best) iterates. We derive worst-case upper bounds on the suboptimality of the final point and show that, with an optimally tuned stepsize parameter, the last iterate converges at the rate $O (1/ N^{1/4})$ . We complement this guarantee with matching lower-bound constructions, proving that this rate is tight and that AdaGrad's last-iterate rate is strictly worse than the classical $O (1/ N^{1/2})$ rate for its averaged iterate. Technically, our analysis introduces an exponent parameter that captures the growth of the cumulative squared subgradients; combined with the last-iterate inequality of Zamani and Glineur (2025), this reduces the problem to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.