Improved Analysis and Rates for Variance Reduction under   Without-replacement Sampling Orders

Xinmeng Huang; Kun Yuan; Xianghui Mao; Wotao Yin

arXiv:2104.12112·cs.LG·October 28, 2021·1 cites

Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders

Xinmeng Huang, Kun Yuan, Xianghui Mao, Wotao Yin

PDF

Open Access

TL;DR

This paper improves the theoretical understanding and convergence rates of variance reduction methods under without-replacement sampling orders, introducing Prox-DFinito and optimal cyclic ordering to achieve state-of-the-art results.

Contribution

It develops a damped variant of Finito called Prox-DFinito with improved convergence guarantees under various without-replacement sampling schemes.

Findings

01

Prox-DFinito matches full-batch gradient descent rates.

02

Optimal cyclic sampling can achieve data-heterogeneity independent convergence.

03

The analysis guides the design of optimal sampling orders.

Abstract

When applying a stochastic algorithm, one must choose an order to draw samples. The practical choices are without-replacement sampling orders, which are empirically faster and more cache-friendly than uniform-iid-sampling but often have inferior theoretical guarantees. Without-replacement sampling is well understood only for SGD without variance reduction. In this paper, we will improve the convergence analysis and rates of variance reduction under without-replacement sampling orders for composite finite-sum minimization. Our results are in two-folds. First, we develop a damped variant of Finito called Prox-DFinito and establish its convergence rates with random reshuffling, cyclic sampling, and shuffling-once, under both convex and strongly convex scenarios. These rates match full-batch gradient descent and are state-of-the-art compared to the existing results for without-replacement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Statistical Methods and Inference

MethodsStochastic Gradient Descent