Random Reshuffling with Momentum for Nonconvex Problems: Iteration Complexity and Last Iterate Convergence

Junwen Qiu; Bohao Ma; Andre Milzarek

arXiv:2404.18452·math.OC·March 24, 2026

Random Reshuffling with Momentum for Nonconvex Problems: Iteration Complexity and Last Iterate Convergence

Junwen Qiu, Bohao Ma, Andre Milzarek

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the convergence properties of Random Reshuffling with Momentum (RRM), providing new complexity bounds and convergence guarantees for nonconvex optimization, which are relevant for machine learning applications.

Contribution

The work establishes the first complexity bounds and convergence guarantees for RRM with momentum in nonconvex settings, including Nesterov acceleration and mini-batches.

Findings

01

RRM achieves a complexity of O(n^{-1/3}((1-β^n)T)^{-2/3})

02

Every accumulation point of RRM iterates is a stationary point

03

Sequences of RRM iterates converge to a single stationary point when the objective is definable

Abstract

Random reshuffling with momentum (RRM) corresponds to the SGD optimizer with momentum option enabled, as found in many machine learning libraries like PyTorch and TensorFlow. Despite its widespread use, the convergence properties of RRM do not seem to be well understood. This work establishes new complexity bounds and asymptotic convergence guarantees for popular versions of RRM using stochastic heavy-ball momentum, Nesterov acceleration, and mini-batches in a general nonconvex setting. In particular, we prove that the base variant of RRM achieves the complexity $O (n^{- 1/3} ((1 - β^{n}) T)^{- 2/3})$ , where $n$ denotes the number of component functions, $β \in [0, 1)$ is a momentum parameter, and $T$ is the total number of iterations. Furthermore, every accumulation point of the iterates ${x^{k}}_{k}$ generated by RRM is shown to be a stationary point of the problem. When the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Junwen-Qiu/Random-reshuffling-with-momentum
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Markov Chains and Monte Carlo Methods