Gradient Descent Only Converges to Minimizers: Non-Isolated Critical   Points and Invariant Regions

Ioannis Panageas; Georgios Piliouras

arXiv:1605.00405·math.DS·June 8, 2016·43 cites

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Ioannis Panageas, Georgios Piliouras

PDF

Open Access

TL;DR

This paper proves that gradient descent almost surely avoids saddle points with negative curvature in non-convex twice differentiable functions, even with non-isolated critical points, and provides bounds on step-size for convergence.

Contribution

It extends previous results by showing measure-zero convergence to saddle points for functions with non-isolated critical points and invariant regions, under weaker smoothness assumptions.

Findings

01

Gradient descent converges to minimizers with probability one.

02

Saddle points with negative eigenvalues are almost surely avoided.

03

An upper bound on step-size for convergence is established.

Abstract

Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Bone and Joint Diseases