# Extending the step-size restriction for gradient descent to avoid strict   saddle points

**Authors:** Hayden Schaeffer, Scott G. McCalla

arXiv: 1908.01753 · 2019-08-06

## TL;DR

This paper extends the allowable step-size in gradient descent algorithms, showing they almost surely avoid strict saddle points under broader conditions, including various learning rate schedules.

## Contribution

It establishes larger step-size bounds for gradient descent to avoid strict saddle points, extending previous results up to the convex limit.

## Key findings

- Gradient descent with step-size up to 2/L avoids strict saddle points almost surely.
- Results hold for various learning rate schedules, including decaying and piece-wise constant.
- Probability of convergence to strict saddle points is zero with random initialization.

## Abstract

We provide larger step-size restrictions for which gradient descent based algorithms (almost surely) avoid strict saddle points. In particular, consider a twice differentiable (non-convex) objective function whose gradient has Lipschitz constant L and whose Hessian is well-behaved. We prove that the probability of initial conditions for gradient descent with step-size up to 2/L converging to a strict saddle point, given one uniformly random initialization, is zero. This extends previous results up to the sharp limit imposed by the convex case. In addition, the arguments hold in the case when a learning rate schedule is given, with either a continuous decaying rate or a piece-wise constant schedule.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.01753/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1908.01753/full.md

---
Source: https://tomesphere.com/paper/1908.01753