Random Permutations Fix a Worst Case for Cyclic Coordinate Descent
Ching-Pei Lee, Stephen J. Wright

TL;DR
This paper analyzes the convergence behavior of different coordinate descent methods, showing that random permutations improve performance on certain quadratic functions where cyclic methods are slow.
Contribution
It provides a tight analysis explaining why random-permutations cyclic coordinate descent outperforms cyclic coordinate descent on specific quadratic functions.
Findings
RPCD outperforms CCD on certain quadratic functions
RPCD performs better than RCD in some regimes
The paper offers a theoretical explanation for RPCD's effectiveness
Abstract
Variants of the coordinate descent approach for minimizing a nonlinear function are distinguished in part by the order in which coordinates are considered for relaxation. Three common orderings are cyclic (CCD), in which we cycle through the components of in order; randomized (RCD), in which the component to update is selected randomly and independently at each iteration; and random-permutations cyclic (RPCD), which differs from CCD only in that a random permutation is applied to the variables at the start of each cycle. Known convergence guarantees are weaker for CCD and RPCD than for RCD, though in most practical cases, computational performance is similar among all these variants. There is a certain type of quadratic function for which CCD is significantly slower than for RCD; a recent paper by \cite{SunY16a} has explored the poor behavior of CCD on functions of this type. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
