Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent
Ning Yang, Yikuan Zhang, Qi Ouyang, Chao Tang, Yuhai Tu

TL;DR
This paper uncovers a nonequilibrium physical mechanism in SGD that explains its preference for flat minima by analyzing transient escape dynamics from sharp valleys, linking landscape geometry to generalization.
Contribution
It introduces a physical model showing how SGD noise reshapes the loss landscape and reveals a transient freezing mechanism affecting solution quality.
Findings
SGD exhibits a transient exploratory phase escaping sharp valleys.
Increasing SGD noise delays freezing, promoting flatter minima.
A physical model links learning dynamics to landscape geometry.
Abstract
Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism governing solution selection. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and transition toward flatter regions of the loss landscape. By using a tractable physical model, we show that the SGD noise reshapes the landscape into an effective potential that favors flat solutions. Crucially, we uncover a transient freezing mechanism: as training proceeds, growing energy barriers suppress inter-valley transitions and ultimately trap the dynamics within a single basin. Increasing the SGD noise strength delays this freezing, which enhances convergence to flatter minima. Together,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Machine Learning in Materials Science
