Semi-Stochastic Coordinate Descent

Jakub Kone\v{c}n\'y; Zheng Qu; Peter Richt\'arik

arXiv:1412.6293·cs.NA·December 22, 2014

Semi-Stochastic Coordinate Descent

Jakub Kone\v{c}n\'y, Zheng Qu, Peter Richt\'arik

PDF

TL;DR

The paper introduces semi-stochastic coordinate descent (S2CD), a new optimization method combining deterministic and stochastic steps, which efficiently minimizes strongly convex functions represented as averages of many smooth convex functions.

Contribution

It presents a novel semi-stochastic coordinate descent algorithm with a unique update scheme and analyzes its complexity, including a new condition number for improved convergence.

Findings

01

Achieves $O(n ext{log}(1/\epsilon))$ gradient evaluations.

02

Achieves $O(\hat{\kappa}\text{log}(1/\epsilon))$ partial derivative evaluations.

03

Progressively improves the stochastic gradient estimate.

Abstract

We propose a novel stochastic gradient method---semi-stochastic coordinate descent (S2CD)---for the problem of minimizing a strongly convex function represented as the average of a large number of smooth convex functions: $f (x) = \frac{1}{n} \sum_{i} f_{i} (x)$ . Our method first performs a deterministic step (computation of the gradient of $f$ at the starting point), followed by a large number of stochastic steps. The process is repeated a few times, with the last stochastic iterate becoming the new starting point where the deterministic step is taken. The novelty of our method is in how the stochastic steps are performed. In each such step, we pick a random function $f_{i}$ and a random coordinate $j$ ---both using nonuniform distributions---and update a single coordinate of the decision vector only, based on the computation of the $j^{t h}$ partial derivative of $f_{i}$ at two different points.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.