Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates   for Minibatches

Filip Hanzely; Peter Richt\'arik

arXiv:1809.09354·math.OC·October 1, 2018·AISTATS·6 cites

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Filip Hanzely, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces a flexible accelerated coordinate descent method with arbitrary sampling, providing new importance sampling strategies that outperform previous minibatch approaches, leading to faster convergence especially in large-scale machine learning tasks.

Contribution

The paper designs and analyzes a generalized accelerated coordinate descent method with arbitrary sampling, and develops new importance sampling techniques that significantly improve minibatch optimization performance.

Findings

01

New importance sampling for minibatch ACD outperforms uniform sampling.

02

The method's convergence rate can be much faster than previous approaches, especially for small minibatch sizes.

03

Similar improvements are achieved for nonaccelerated coordinate descent.

Abstract

Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (ACD) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. If all coordinates are updated in each iteration, our method reduces to the classical accelerated gradient descent method AGD of Nesterov. If a single coordinate is updated in each iteration, and we pick probabilities proportional to the square roots of the coordinate-wise Lipschitz constants, our method reduces to the currently fastest coordinate descent method NUACDM of Allen-Zhu, Qu, Richt\'{a}rik and Yuan. While mini-batch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Sparse and Compressive Sensing Techniques