Gradient descent avoids strict saddles with a simple line-search method too
Andreea-Alexandra Mu\c{s}at, Nicolas Boumal

TL;DR
This paper proves that a modified line-search gradient descent method can avoid strict saddle points on smooth functions, extending the guarantee to Riemannian manifolds and relaxing common assumptions.
Contribution
It introduces a new convergence guarantee for line-search gradient descent avoiding strict saddles, using the Luzin N^{-1} property and extending to Riemannian optimization.
Findings
Line-search GD avoids strict saddles on $C^2$ functions.
Extension of guarantees to Riemannian gradient descent.
Improved convergence guarantees for RGD with constant step size.
Abstract
It is known that gradient descent (GD) on a cost function generically avoids strict saddle points when using a small, constant step size. However, no such guarantee existed for GD with a line-search method. We provide one for a modified version of the standard Armijo backtracking method with generic, arbitrarily large initial step size. The proof underlines the double role of the Luzin property for the iteration maps, and allows to forgo the habitual Lipschitz gradient assumption. We extend this to the Riemannian setting (RGD), assuming the retraction is real analytic (though the cost function still only needs to be ). In closing, we also improve guarantees for RGD with a constant step size in some scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Geometric Analysis and Curvature Flows
