Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization

Zijian Liu; Zhengyuan Zhou

arXiv:2505.23056·math.OC·May 30, 2025

Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization

Zijian Liu, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper proves that Random Reshuffle and Single Shuffle algorithms outperform Proximal Gradient Descent in last-iterate convergence for nonsmooth convex optimization, providing the first such improved analysis and nearly optimal convergence results.

Contribution

It offers the first improved last-iterate convergence analysis for nonsmooth convex functions, showing the superiority of shuffle-based methods over Proximal GD.

Findings

01

RR and SS are provably faster than Proximal GD in nonsmooth cases.

02

First nearly optimal convergence result for suffix average under RR scheme.

03

Shuffle methods benefit from randomness, enhancing convergence speed.

Abstract

We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order is determined by a permutation on the indices of functions. In contrast to its easy implementation and effective performance in practice, the theoretical understanding remains limited. A recent advance by (Liu & Zhou, 2024b) establishes the first last-iterate convergence results under various settings, especially proving the optimal rates for smooth (strongly) convex optimization. However, their bounds for nonsmooth (strongly) convex functions are only as fast as Proximal GD. In this work, we provide the first improved last-iterate analysis for the nonsmooth case demonstrating that the widely used Random Reshuffle ( $RR$ ) and Single Shuffle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods