An Alternative View: When Does SGD Escape Local Minima?
Robert Kleinberg, Yuanzhi Li, Yang Yuan

TL;DR
This paper proposes an alternative view of SGD as operating on a smoothed loss function, explaining its ability to escape local minima and perform well on neural networks by leveraging local convexity properties.
Contribution
It introduces a theoretical framework showing SGD's effectiveness on a broader class of functions through local convexity of neighborhood gradients.
Findings
SGD tends to avoid sharp local minima with small diameters.
Neural network loss surfaces exhibit local convexity properties.
The neighborhood size influences SGD's ability to escape minima.
Abstract
Stochastic gradient descent (SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern neural networks. In order to understand this phenomenon, we take an alternative view that SGD is working on the convolved (thus smoothed) version of the loss function. We show that, even if the function has many bad local minima or saddle points, as long as for every point , the weighted average of the gradients of its neighborhoods is one point convex with respect to the desired solution , SGD will get close to, and then stay around with constant probability. More specifically, SGD will not get stuck at "sharp" local minima with small diameters, as long as the neighborhoods of these regions contain enough gradient information. The neighborhood size is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Machine Learning and Algorithms
MethodsAffine Coupling · Normalizing Flows · Stochastic Gradient Descent
