Stochastic Learning under Random Reshuffling with Constant Step-sizes

Bicheng Ying; Kun Yuan; Stefan Vlaski; Ali H. Sayed

arXiv:1803.07964·cs.LG·January 30, 2019

Stochastic Learning under Random Reshuffling with Constant Step-sizes

Bicheng Ying, Kun Yuan, Stefan Vlaski, Ali H. Sayed

PDF

TL;DR

This paper analyzes the benefits of random reshuffling over uniform sampling in stochastic gradient methods with constant step-sizes, showing it leads to smaller neighborhood convergence and better steady-state performance.

Contribution

It provides a theoretical analysis of random reshuffling with constant step-sizes, demonstrating its superior convergence neighborhood and deriving explicit steady-state error expressions.

Findings

01

Random reshuffling converges to a smaller neighborhood ($O(^2)$) than uniform sampling.

02

Explicit steady-state mean-square-error expressions are derived.

03

Random reshuffling explains observed periodic behaviors in practice.

Abstract

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size $O (μ^{2})$ around the minimizer rather than $O (μ)$ . Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.