Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient   Descent

Chi Jin; Praneeth Netrapalli; Michael I. Jordan

arXiv:1711.10456·cs.LG·November 29, 2017·50 cites

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Chi Jin, Praneeth Netrapalli, Michael I. Jordan

PDF

Open Access

TL;DR

This paper demonstrates that a variant of Nesterov's accelerated gradient descent can escape saddle points faster than standard gradient descent in nonconvex optimization, achieving improved convergence rates without Hessian computations.

Contribution

The paper introduces a Hessian-free accelerated gradient descent variant with a novel analysis framework, showing faster escape from saddle points in nonconvex optimization.

Findings

01

Escapes saddle points in O(1/psilon^{7/4}) iterations

02

Faster than gradient descent's O(1/psilon^{2}) iterations

03

First single-loop, Hessian-free algorithm with improved rate

Abstract

Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stationary point in $\tilde{O} (1/ ϵ^{7/4})$ iterations, faster than the $\tilde{O} (1/ ϵ^{2})$ iterations required by GD. To the best of our knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first-order stationary point. Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research