Stochastic Cubic Regularization for Fast Nonconvex Optimization

Nilesh Tripuraneni; Mitchell Stern; Chi Jin; Jeffrey Regier; Michael; I. Jordan

arXiv:1711.02838·cs.LG·December 7, 2017·46 cites

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael, I. Jordan

PDF

Open Access

TL;DR

This paper introduces a stochastic cubic-regularized Newton method that efficiently escapes saddle points and finds local minima in nonconvex optimization with fewer evaluations than traditional stochastic gradient descent.

Contribution

It presents a stochastic variant of the cubic-regularized Newton method that achieves faster convergence rates without complex acceleration or variance reduction.

Findings

01

Achieves $ ilde{O}( ext{epsilon}^{-3.5})$ complexity for finding local minima.

02

Requires stochastic gradient and Hessian-vector product evaluations as efficiently as stochastic gradients.

03

Improves upon the $ ilde{O}( ext{epsilon}^{-4})$ rate of stochastic gradient descent.

Abstract

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\tilde{O} (ϵ^{- 3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\tilde{O} (ϵ^{- 4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs