Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du; Chi Jin; Jason D. Lee; Michael I. Jordan; Barnabas; Poczos; Aarti Singh

arXiv:1705.10412·math.OC·November 7, 2017·63 cites

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas, Poczos, Aarti Singh

PDF

Open Access

TL;DR

This paper demonstrates that standard gradient descent can take exponential time to escape saddle points, whereas perturbed gradient descent can do so efficiently, highlighting the importance of perturbations in non-convex optimization.

Contribution

It provides a theoretical example where gradient descent is exponentially slow at escaping saddle points, contrasting with the polynomial-time performance of perturbed gradient descent.

Findings

01

Gradient descent can take exponential time to escape saddle points.

02

Perturbed gradient descent escapes saddle points in polynomial time.

03

Experiments support the theoretical results.

Abstract

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Topological and Geometric Data Analysis