Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders
Xinmeng Huang, Kun Yuan, Xianghui Mao, Wotao Yin

TL;DR
This paper improves the theoretical understanding and convergence rates of variance reduction methods under without-replacement sampling orders, introducing Prox-DFinito and optimal cyclic ordering to achieve state-of-the-art results.
Contribution
It develops a damped variant of Finito called Prox-DFinito with improved convergence guarantees under various without-replacement sampling schemes.
Findings
Prox-DFinito matches full-batch gradient descent rates.
Optimal cyclic sampling can achieve data-heterogeneity independent convergence.
The analysis guides the design of optimal sampling orders.
Abstract
When applying a stochastic algorithm, one must choose an order to draw samples. The practical choices are without-replacement sampling orders, which are empirically faster and more cache-friendly than uniform-iid-sampling but often have inferior theoretical guarantees. Without-replacement sampling is well understood only for SGD without variance reduction. In this paper, we will improve the convergence analysis and rates of variance reduction under without-replacement sampling orders for composite finite-sum minimization. Our results are in two-folds. First, we develop a damped variant of Finito called Prox-DFinito and establish its convergence rates with random reshuffling, cyclic sampling, and shuffling-once, under both convex and strongly convex scenarios. These rates match full-batch gradient descent and are state-of-the-art compared to the existing results for without-replacement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Statistical Methods and Inference
MethodsStochastic Gradient Descent
