On Optimal Probabilities in Stochastic Coordinate Descent Methods
Peter Richt\'arik, Martin Tak\'a\v{c}

TL;DR
This paper introduces a new parallel coordinate descent method called `NSync` that updates random coordinate subsets with optimized probabilities, achieving faster convergence and better practical performance than uniform or full-coordinate updates.
Contribution
The paper develops a novel non-uniform probability scheme for parallel coordinate descent, providing convergence analysis and demonstrating significant improvements over existing methods.
Findings
`NSync` outperforms uniform variants by an order of magnitude.
Optimal single-coordinate updates can require fewer iterations than full updates.
The method's convergence rates depend on probability assignments and strong convexity assumptions.
Abstract
We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iteration---with optimal probabilities---may require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
