Variance Reduction for Faster Non-Convex Optimization

Zeyuan Allen-Zhu; Elad Hazan

arXiv:1603.05643·math.OC·August 26, 2016·126 cites

Variance Reduction for Faster Non-Convex Optimization

Zeyuan Allen-Zhu, Elad Hazan

PDF

Open Access

TL;DR

This paper introduces a novel variance reduction technique for non-convex optimization, achieving faster convergence rates than traditional methods, and demonstrates its effectiveness on neural network training.

Contribution

It presents the first variance reduction-based stochastic method for non-convex optimization with an improved $O(1/\varepsilon)$ convergence rate.

Findings

01

Achieves $O(1/\varepsilon)$ convergence rate for sum-of-smooth-functions objectives.

02

Faster than full gradient descent by a factor of $\Omega(n^{1/3})$.

03

Effective in empirical risk minimization and neural network training.

Abstract

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in $O (1/ ε)$ iterations for smooth objectives, and stochastic gradient descent that converges in $O (1/ ε^{2})$ iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an $O (1/ ε)$ rate, and is faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning