Convergence Rate of the Last Iterate of Stochastic Proximal Algorithms

Kevin Kurian Thomas Vaidyan; Michael P. Friedlander; Ahmet Alacaoglu

arXiv:2602.05489·math.OC·February 6, 2026

Convergence Rate of the Last Iterate of Stochastic Proximal Algorithms

Kevin Kurian Thomas Vaidyan, Michael P. Friedlander, Ahmet Alacaoglu

PDF

Open Access

TL;DR

This paper establishes optimal convergence rates for the last iterate of stochastic proximal algorithms in convex optimization, relaxing common variance assumptions and applying to multi-task and federated learning scenarios.

Contribution

It proves the $ ilde{O}(1/ oot{2}{T})$ convergence rate for the last iterate under relaxed conditions, extending prior results to more general settings.

Findings

01

Achieves optimal convergence rate up to log factors.

02

Applies to graph-guided regularizers in multi-task learning.

03

Relaxes the bounded variance assumption in stochastic algorithms.

Abstract

We analyze two classical algorithms for solving additively composite convex optimization problems where the objective is the sum of a smooth term and a nonsmooth regularizer: proximal stochastic gradient method for a single regularizer; and the randomized incremental proximal method, which uses the proximal operator of a randomly selected function when the regularizer is given as the sum of many nonsmooth functions. We focus on relaxing the bounded variance assumption that is common, yet stringent, for getting last iterate convergence rates. We prove the $O (1/ T)$ rate of convergence for the last iterate of both algorithms under componentwise convexity and smoothness, which is optimal up to log terms. Our results apply directly to graph-guided regularizers that arise in multi-task and federated learning, where the regularizer decomposes as a sum over edges of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques