Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points

Ryan Murray; Brian Swenson; Soummya Kar

arXiv:1711.05224·math.OC·July 25, 2018·IEEE Trans. Autom. Control.·1 cites

Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points

Ryan Murray, Brian Swenson, Soummya Kar

PDF

Open Access

TL;DR

This paper demonstrates that normalized gradient descent (NGD) can escape saddle points rapidly in non-convex optimization, unlike standard gradient descent, leading to faster convergence in high-dimensional problems.

Contribution

It provides a theoretical analysis showing NGD's quick escape from saddle points and establishes global convergence bounds for NGD in non-convex optimization.

Findings

01

NGD almost never converges to saddle points

02

Escape time from saddle points is at most 5√κ r

03

Global convergence-time bounds are derived for NGD

Abstract

The note considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. A serious shortcoming of GD in non-convex problems is that GD may take arbitrarily long to escape from the neighborhood of a saddle point. This issue can make the convergence of GD arbitrarily slow, particularly in high-dimensional non-convex problems where the relative number of saddle points is often large. The paper focuses on continuous-time descent. It is shown that, contrary to standard GD, NGD escapes saddle points `quickly.' In particular, it is shown that (i) NGD `almost never' converges to saddle points and (ii) the time required for NGD to escape from a ball of radius $r$ about a saddle point $x^{*}$ is at most $5 κ r$ , where $κ$ is the condition number of the Hessian of $f$ at $x^{*}$ . As an application of this result, a global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques