First-order Methods Almost Always Avoid Saddle Points

Jason D. Lee; Ioannis Panageas; Georgios Piliouras; Max Simchowitz,; Michael I. Jordan; Benjamin Recht

arXiv:1710.07406·stat.ML·October 23, 2017·76 cites

First-order Methods Almost Always Avoid Saddle Points

Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz,, Michael I. Jordan, Benjamin Recht

PDF

Open Access

TL;DR

This paper proves that a broad class of first-order optimization algorithms almost always avoid saddle points from almost all initializations, using dynamical systems theory, without requiring second-order derivatives or randomness.

Contribution

It introduces a unified dynamical systems framework to show that various first-order methods almost surely avoid saddle points without second-order information.

Findings

01

First-order methods avoid saddle points for almost all initializations.

02

Stable Manifold Theorem applies to analyze global stability.

03

No second-order derivatives or additional randomness needed.

Abstract

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks