Natasha 2: Faster Non-Convex Optimization Than SGD
Zeyuan Allen-Zhu

TL;DR
This paper introduces Natasha 2, a stochastic algorithm that significantly accelerates finding approximate local minima in smooth neural networks and nonconvex functions compared to traditional SGD, using fewer backpropagations.
Contribution
The paper presents Natasha 2, a novel stochastic algorithm that achieves faster convergence rates for non-convex optimization than SGD, with theoretical guarantees.
Findings
Achieves $O( ext{epsilon}^{-3.25})$ complexity for local minima
Outperforms SGD's $O( ext{epsilon}^{-4})$ complexity
Applicable to any smooth neural network and nonconvex function
Abstract
We design a stochastic algorithm to train any smooth neural network to -approximate local minima, using backpropagations. The best result was essentially by SGD. More broadly, it finds -approximate local minima of any smooth nonconvex function in rate , with only oracle access to stochastic gradients.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Natasha 2: Faster Non-convex Optimization Than SGD· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
