Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

arXiv:1708.08694·math.OC·June 12, 2018·53 cites

Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

PDF

Open Access 1 Video

TL;DR

This paper introduces Natasha 2, a stochastic algorithm that significantly accelerates finding approximate local minima in smooth neural networks and nonconvex functions compared to traditional SGD, using fewer backpropagations.

Contribution

The paper presents Natasha 2, a novel stochastic algorithm that achieves faster convergence rates for non-convex optimization than SGD, with theoretical guarantees.

Findings

01

Achieves $O( ext{epsilon}^{-3.25})$ complexity for local minima

02

Outperforms SGD's $O( ext{epsilon}^{-4})$ complexity

03

Applicable to any smooth neural network and nonconvex function

Abstract

We design a stochastic algorithm to train any smooth neural network to $ε$ -approximate local minima, using $O (ε^{- 3.25})$ backpropagations. The best result was essentially $O (ε^{- 4})$ by SGD. More broadly, it finds $ε$ -approximate local minima of any smooth nonconvex function in rate $O (ε^{- 3.25})$ , with only oracle access to stochastic gradients.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Natasha 2: Faster Non-convex Optimization Than SGD· youtube

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent