Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Ioannis Panageas, Georgios Piliouras

TL;DR
This paper proves that gradient descent almost surely avoids saddle points with negative curvature in non-convex twice differentiable functions, even with non-isolated critical points, and provides bounds on step-size for convergence.
Contribution
It extends previous results by showing measure-zero convergence to saddle points for functions with non-isolated critical points and invariant regions, under weaker smoothness assumptions.
Findings
Gradient descent converges to minimizers with probability one.
Saddle points with negative eigenvalues are almost surely avoided.
An upper bound on step-size for convergence is established.
Abstract
Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Bone and Joint Diseases
