How Good is SGD with Random Shuffling?

Itay Safran; Ohad Shamir

arXiv:1908.00045·cs.LG·June 3, 2021·5 cites

How Good is SGD with Random Shuffling?

Itay Safran, Ohad Shamir

PDF

Open Access

TL;DR

This paper analyzes the theoretical performance limits of SGD with random shuffling in finite-sum optimization, revealing a fundamental gap between single and multiple reshuffling strategies and establishing lower bounds that match upper bounds in specific cases.

Contribution

It provides the first lower bounds on the expected error of SGD with random reshuffling, clarifying the advantages and limitations of different shuffling heuristics.

Findings

01

Lower bounds show inherent performance gaps between single and repeated reshuffling.

02

Re-shuffling after each pass yields better error rates than a single shuffle.

03

Matching upper bounds are established for univariate quadratic functions.

Abstract

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions. This setting has been investigated in several recent works, but the optimal error rates remain unclear. In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. In particular, we prove that after $k$ passes over $n$ individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent