Stochastic Variance Reduction for Nonconvex Optimization

Sashank J. Reddi; Ahmed Hefny; Suvrit Sra; Barnabas Poczos; Alex Smola

arXiv:1603.06160·math.OC·April 6, 2016·242 cites

Stochastic Variance Reduction for Nonconvex Optimization

Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of SVRG methods for nonconvex optimization, demonstrating faster convergence than SGD and gradient descent, with linear convergence in certain cases and benefits from mini-batching.

Contribution

It extends the analysis of SVRG to nonconvex problems, proving convergence rates and linear convergence in specific subclasses, which was previously limited to convex cases.

Findings

01

SVRG converges faster than SGD and gradient descent for nonconvex problems.

02

SVRG attains linear convergence to the global optimum in certain nonconvex subclasses.

03

Mini-batching yields linear speedup in parallel implementations of SVRG.

Abstract

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent