Why Random Reshuffling Beats Stochastic Gradient Descent

Mert G\"urb\"uzbalaban; Asuman Ozdaglar; Pablo Parrilo

arXiv:1510.08560·math.OC·February 9, 2022·Math. Program.

Why Random Reshuffling Beats Stochastic Gradient Descent

Mert G\"urb\"uzbalaban, Asuman Ozdaglar, Pablo Parrilo

PDF

TL;DR

This paper provides a rigorous convergence analysis of the random reshuffling (RR) method, demonstrating its superior convergence rate over stochastic gradient descent (SGD) for strongly convex functions, and introduces a modified RR with even faster convergence.

Contribution

It offers the first theoretical convergence rate analysis of RR, showing it outperforms SGD, and proposes a modified RR with accelerated convergence rate.

Findings

01

RR with averaging converges at rate Θ(1/k^{2s}) for smooth strongly convex functions.

02

RR outperforms SGD with a convergence rate Ω(1/k).

03

A modified RR achieves a convergence rate of O(1/k^2).

Abstract

We analyze the convergence rate of the random reshuffling (RR) method, which is a randomized first-order incremental algorithm for minimizing a finite sum of convex component functions. RR proceeds in cycles, picking a uniformly random order (permutation) and processing the component functions one at a time according to this order, i.e., at each cycle, each component function is sampled without replacement from the collection. Though RR has been numerically observed to outperform its with-replacement counterpart stochastic gradient descent (SGD), characterization of its convergence rate has been a long standing open question. In this paper, we answer this question by showing that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize $α_{k} = Θ (1/ k^{s})$ for $s \in (1/2, 1)$ converges at rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.