Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex   Optimization

Stefan Vlaski; Ali H. Sayed

arXiv:1908.07023·math.OC·August 21, 2019

Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

Stefan Vlaski, Ali H. Sayed

PDF

TL;DR

This paper provides new theoretical guarantees for stochastic gradient descent in non-convex optimization, showing that under relaxed noise conditions, efficient escape from saddle points is achievable without extra noise or complex assumptions.

Contribution

It introduces a mean-square analysis approach that relaxes traditional noise bounds, ensuring saddle-point escape in non-convex SGD without additional noise or restrictive assumptions.

Findings

01

Relaxed variance bounds suffice for saddle-point escape.

02

Mean-square analysis offers an alternative to concentration-based methods.

03

No need for extra noise injection or global dispersive noise assumptions.

Abstract

Recent years have seen increased interest in performance guarantees of gradient descent algorithms for non-convex optimization. A number of works have uncovered that gradient noise plays a critical role in the ability of gradient descent recursions to efficiently escape saddle-points and reach second-order stationary points. Most available works limit the gradient noise component to be bounded with probability one or sub-Gaussian and leverage concentration inequalities to arrive at high-probability results. We present an alternate approach, relying primarily on mean-square arguments and show that a more relaxed relative bound on the gradient noise variance is sufficient to ensure efficient escape from saddle-points without the need to inject additional noise, employ alternating step-sizes or rely on a global dispersive noise assumption, as long as a gradient noise component is present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.