Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection
Julie Nutini, Mark Schmidt, Issam H. Laradji, Michael Friedlander,, Hoyt Koepke

TL;DR
This paper demonstrates that the Gauss-Southwell rule for coordinate descent generally converges faster than random selection, especially when costs are comparable, and introduces variants that further improve convergence.
Contribution
It provides a simple analysis showing faster convergence of Gauss-Southwell over random selection, and proposes new rules and variants to enhance performance.
Findings
Gauss-Southwell rule outperforms random selection in convergence speed.
Exact coordinate optimization benefits sparse problems.
Lipschitz-aware Gauss-Southwell rule accelerates convergence.
Abstract
There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that---except in extreme cases---its convergence rate is faster than choosing random coordinates. Further, in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
