Why Random Reshuffling Beats Stochastic Gradient Descent
Mert G\"urb\"uzbalaban, Asuman Ozdaglar, Pablo Parrilo

TL;DR
This paper provides a rigorous convergence analysis of the random reshuffling (RR) method, demonstrating its superior convergence rate over stochastic gradient descent (SGD) for strongly convex functions, and introduces a modified RR with even faster convergence.
Contribution
It offers the first theoretical convergence rate analysis of RR, showing it outperforms SGD, and proposes a modified RR with accelerated convergence rate.
Findings
RR with averaging converges at rate Θ(1/k^{2s}) for smooth strongly convex functions.
RR outperforms SGD with a convergence rate Ω(1/k).
A modified RR achieves a convergence rate of O(1/k^2).
Abstract
We analyze the convergence rate of the random reshuffling (RR) method, which is a randomized first-order incremental algorithm for minimizing a finite sum of convex component functions. RR proceeds in cycles, picking a uniformly random order (permutation) and processing the component functions one at a time according to this order, i.e., at each cycle, each component function is sampled without replacement from the collection. Though RR has been numerically observed to outperform its with-replacement counterpart stochastic gradient descent (SGD), characterization of its convergence rate has been a long standing open question. In this paper, we answer this question by showing that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize for converges at rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
