# SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle   Points

**Authors:** Zhize Li

arXiv: 1904.09265 · 2019-06-24

## TL;DR

This paper introduces SSRGD, a simple perturbed stochastic recursive gradient descent algorithm that efficiently finds second-order stationary points in nonconvex optimization, outperforming more complex methods in simplicity and analysis.

## Contribution

The paper presents SSRGD, a straightforward perturbation-based method for escaping saddle points, with near-optimal stochastic gradient complexity and simpler analysis compared to existing algorithms.

## Key findings

- SSRGD finds second-order stationary points with near-optimal complexity.
- The algorithm also efficiently finds first-order stationary points.
- Results extend from finite-sum to online nonconvex problems.

## Abstract

We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09265/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1904.09265/full.md

---
Source: https://tomesphere.com/paper/1904.09265