Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points
Ryan Murray, Brian Swenson, Soummya Kar

TL;DR
This paper demonstrates that normalized gradient descent (NGD) can escape saddle points rapidly in non-convex optimization, unlike standard gradient descent, leading to faster convergence in high-dimensional problems.
Contribution
It provides a theoretical analysis showing NGD's quick escape from saddle points and establishes global convergence bounds for NGD in non-convex optimization.
Findings
NGD almost never converges to saddle points
Escape time from saddle points is at most 5√κ r
Global convergence-time bounds are derived for NGD
Abstract
The note considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. A serious shortcoming of GD in non-convex problems is that GD may take arbitrarily long to escape from the neighborhood of a saddle point. This issue can make the convergence of GD arbitrarily slow, particularly in high-dimensional non-convex problems where the relative number of saddle points is often large. The paper focuses on continuous-time descent. It is shown that, contrary to standard GD, NGD escapes saddle points `quickly.' In particular, it is shown that (i) NGD `almost never' converges to saddle points and (ii) the time required for NGD to escape from a ball of radius about a saddle point is at most , where is the condition number of the Hessian of at . As an application of this result, a global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques
