Natasha: Faster Non-Convex Stochastic Optimization Via Strongly   Non-Convex Parameter

Zeyuan Allen-Zhu

arXiv:1702.00763·math.OC·September 28, 2018·34 cites

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

Zeyuan Allen-Zhu

PDF

Open Access

TL;DR

This paper introduces stochastic optimization methods tailored for nonconvex functions, with convergence rates that adapt based on the function's degree of nonconvexity, characterized by the Hessian's smallest eigenvalue.

Contribution

The authors propose new stochastic first-order algorithms whose convergence depends on the Hessian's smallest eigenvalue, outperforming existing methods across different nonconvexity regimes.

Findings

01

Methods outperform known results for various nonconvexity levels.

02

Convergence rates depend on the eigenvalue parameter, showing a dichotomy at threshold .

03

Different scaling behaviors for and regimes.

Abstract

Given a nonconvex function that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The convergence of our new methods depends on the smallest (negative) eigenvalue $- σ$ of the Hessian, a parameter that describes how nonconvex the function is. Our methods outperform known results for a range of parameter $σ$ , and can be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $σ_{0}$ so that the currently fastest methods for $σ > σ_{0}$ and for $σ < σ_{0}$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods