Random Reshuffling with Variance Reduction: New Analysis and Better   Rates

Grigory Malinovsky; Alibek Sailanbayev; Peter Richt\'arik

arXiv:2104.09342·cs.LG·April 20, 2021·1 cites

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Grigory Malinovsky, Alibek Sailanbayev, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper provides new theoretical analysis and improved convergence rates for variance-reduced stochastic gradient methods under random reshuffling, including SVRG variants, in both strongly-convex and convex settings.

Contribution

It introduces the first analysis of RR-SVRG with improved convergence rates and extends results to cyclic and shuffle-once variants, along with a generalized variance reduction scheme.

Findings

01

RR-SVRG converges linearly with rate O(κ^{3/2}) in strongly-convex case

02

Rate improves to O(κ) in big data regime (n > O(κ))

03

First sublinear rate established for general convex problems

Abstract

Virtually all state-of-the-art methods for training supervised machine learning models are variants of SGD enhanced with a number of additional tricks, such as minibatching, momentum, and adaptive stepsizes. One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}. However, the practical benefits of RR have until very recently been eluding attempts at being satisfactorily explained using theory. Motivated by recent development due to Mishchenko, Khaled and Richt\'{a}rik (2020), in this work we provide the first analysis of SVRG under Random Reshuffling (RR-SVRG) for general finite-sum problems. First, we show that RR-SVRG converges linearly with the rate $O (κ^{3/2})$ in the strongly-convex case, and can be improved further to $O (κ)$ in the big data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data

MethodsStochastic Gradient Descent