A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang, Lei Wu

TL;DR
This paper offers a theoretical framework for understanding how noise in stochastic gradient descent aligns with the loss landscape, influencing optimization dynamics and escape from sharp minima.
Contribution
It introduces metrics to quantify noise geometry alignment and demonstrates their guarantees for linear and nonlinear models, enhancing understanding of SGD behavior.
Findings
SGD noise aligns favorably with local landscape geometry
SGD tends to escape from flatter minima, unlike gradient descent
Cyclical learning rates can leverage noise geometry for better optimization
Abstract
In this paper, we provide a theoretical study of noise geometry for minibatch stochastic gradient descent (SGD), a phenomenon where noise aligns favorably with the geometry of local landscape. We propose two metrics, derived from analyzing how noise influences the loss and subspace projection dynamics, to quantify the alignment strength. We show that for (over-parameterized) linear models and two-layer nonlinear networks, when measured by these metrics, the alignment can be provably guaranteed under conditions independent of the degree of over-parameterization. To showcase the utility of our noise geometry characterizations, we present a refined analysis of the mechanism by which SGD escapes from sharp minima. We reveal that unlike gradient descent (GD), which escapes along the sharpest directions, SGD tends to escape from flatter directions and cyclical learning rates can exploit this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Scientific Research and Discoveries
MethodsStochastic Gradient Descent
