First-order Methods Almost Always Avoid Saddle Points
Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz,, Michael I. Jordan, Benjamin Recht

TL;DR
This paper proves that a broad class of first-order optimization algorithms almost always avoid saddle points from almost all initializations, using dynamical systems theory, without requiring second-order derivatives or randomness.
Contribution
It introduces a unified dynamical systems framework to show that various first-order methods almost surely avoid saddle points without second-order information.
Findings
First-order methods avoid saddle points for almost all initializations.
Stable Manifold Theorem applies to analyze global stability.
No second-order derivatives or additional randomness needed.
Abstract
We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks
