Stochastic Variance Reduction for Nonconvex Optimization
Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

TL;DR
This paper provides a theoretical analysis of SVRG methods for nonconvex optimization, demonstrating faster convergence than SGD and gradient descent, with linear convergence in certain cases and benefits from mini-batching.
Contribution
It extends the analysis of SVRG to nonconvex problems, proving convergence rates and linear convergence in specific subclasses, which was previously limited to convex cases.
Findings
SVRG converges faster than SGD and gradient descent for nonconvex problems.
SVRG attains linear convergence to the global optimum in certain nonconvex subclasses.
Mini-batching yields linear speedup in parallel implementations of SVRG.
Abstract
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
