Variance Reduction for Faster Non-Convex Optimization
Zeyuan Allen-Zhu, Elad Hazan

TL;DR
This paper introduces a novel variance reduction technique for non-convex optimization, achieving faster convergence rates than traditional methods, and demonstrates its effectiveness on neural network training.
Contribution
It presents the first variance reduction-based stochastic method for non-convex optimization with an improved $O(1/\varepsilon)$ convergence rate.
Findings
Achieves $O(1/\varepsilon)$ convergence rate for sum-of-smooth-functions objectives.
Faster than full gradient descent by a factor of $\Omega(n^{1/3})$.
Effective in empirical risk minimization and neural network training.
Abstract
We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in iterations for smooth objectives, and stochastic gradient descent that converges in iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an rate, and is faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
