Permutation-Based SGD: Is Random Optimal?
Shashank Rajput, Kangwook Lee, Dimitris Papailiopoulos

TL;DR
This paper investigates the effectiveness of permutation-based stochastic gradient descent, revealing that the optimality of random permutations varies significantly depending on the function class, with some permutations offering exponential acceleration.
Contribution
The paper demonstrates that the optimality of random permutations in SGD depends on the function class, providing examples where specific permutations outperform random choices.
Findings
For 1D strongly convex functions, certain permutations yield exponentially faster convergence.
For general strongly convex functions, random permutations are proven to be optimal.
Constructed permutations can accelerate convergence for quadratic, strongly convex functions.
Abstract
A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with-replacement sampling. However, is random optimal? We show that this depends heavily on what functions we are optimizing, and the convergence gap between optimal and random permutations can vary from exponential to nonexistent. We first show that for 1-dimensional strongly convex functions, with smooth second derivatives, there exist permutations that offer exponentially faster convergence compared to random. However, for general strongly convex functions, random permutations are optimal. Finally, we show that for quadratic, strongly-convex functions, there are easy-to-construct permutations that lead to accelerated convergence compared to random. Our results suggest that a general convergence characterization of optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
MethodsStochastic Gradient Descent
