Random Reshuffling with Momentum for Nonconvex Problems: Iteration Complexity and Last Iterate Convergence
Junwen Qiu, Bohao Ma, Andre Milzarek

TL;DR
This paper analyzes the convergence properties of Random Reshuffling with Momentum (RRM), providing new complexity bounds and convergence guarantees for nonconvex optimization, which are relevant for machine learning applications.
Contribution
The work establishes the first complexity bounds and convergence guarantees for RRM with momentum in nonconvex settings, including Nesterov acceleration and mini-batches.
Findings
RRM achieves a complexity of O(n^{-1/3}((1-β^n)T)^{-2/3})
Every accumulation point of RRM iterates is a stationary point
Sequences of RRM iterates converge to a single stationary point when the objective is definable
Abstract
Random reshuffling with momentum (RRM) corresponds to the SGD optimizer with momentum option enabled, as found in many machine learning libraries like PyTorch and TensorFlow. Despite its widespread use, the convergence properties of RRM do not seem to be well understood. This work establishes new complexity bounds and asymptotic convergence guarantees for popular versions of RRM using stochastic heavy-ball momentum, Nesterov acceleration, and mini-batches in a general nonconvex setting. In particular, we prove that the base variant of RRM achieves the complexity , where denotes the number of component functions, is a momentum parameter, and is the total number of iterations. Furthermore, every accumulation point of the iterates generated by RRM is shown to be a stationary point of the problem. When the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Markov Chains and Monte Carlo Methods
