Escaping Saddles with Stochastic Gradients
Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

TL;DR
This paper demonstrates that stochastic gradients inherently contain directional information useful for escaping saddle points in non-convex optimization, enabling simpler algorithms to achieve convergence without added noise.
Contribution
It introduces a new assumption showing SGD's natural ability to escape saddles, and provides the first dimension-independent convergence rate for plain SGD to second-order stationary points.
Findings
Stochastic gradients have strong variance along negative curvature directions.
Variance of stochastic gradients is proportional to eigenvalues, not dimension.
Plain SGD can converge to second-order stationary points without explicit noise.
Abstract
We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
