Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent
Donghwa Kim, Jaewook Lee, Chulhee Yun

TL;DR
This paper provides the first theoretical proof that random-permutation coordinate descent (RPCD) consistently outperforms random coordinate descent (RCD) in convergence rates for a class of quadratic functions, explaining empirical advantages.
Contribution
The authors prove that RPCD has a strictly better contraction rate than RCD for certain quadratic functions, filling a key theoretical gap in understanding their performance.
Findings
RPCD's contraction rate is always smaller than RCD's for specific quadratic functions.
Conjecture that the identified function class includes worst-case RPCD examples.
Results extend to all positive-definite quadratic functions under the conjecture.
Abstract
We analyze the convergence rates of two popular variants of coordinate descent (CD): random CD (RCD), in which the coordinates are sampled uniformly at random, and random-permutation CD (RPCD), in which random permutations are used to select the update indices. Despite abundant empirical evidence that RPCD outperforms RCD in various tasks, the theoretical gap between the two algorithms' performance has remained elusive. Even for the benign case of positive-definite quadratic functions with permutation-invariant Hessians, previous efforts have failed to demonstrate a provable performance gap between RCD and RPCD. To this end, we present novel results showing that, for a class of quadratics with permutation-invariant structures, the contraction rate upper bound for RPCD is always strictly smaller than the contraction rate lower bound for RCD for every individual problem instance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Complexity and Algorithms in Graphs
