Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization
Aniket Das, Bernhard Sch\"olkopf, Michael Muehlebach

TL;DR
This paper demonstrates that sampling data without replacement, through strategies like Random Reshuffling and Single Shuffling, accelerates convergence in finite-sum minimax optimization, outperforming traditional sampling methods.
Contribution
It provides a unified analysis of without-replacement sampling strategies, establishing their faster convergence rates for both convex and nonconvex minimax problems.
Findings
Without-replacement sampling strategies lead to faster convergence than uniform sampling.
The analysis includes both convex-strongly convex and nonconvex-nonconcave settings.
Results are robust to data-ordering attacks and recover known rates for incremental gradient methods.
Abstract
We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
