Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

TL;DR
This paper investigates the edge-of-stability regime in neural network training by analyzing a simple, illustrative function, revealing why sharpness hovers near the stability threshold and exhibiting bifurcating dynamics similar to real neural networks.
Contribution
The paper introduces a minimal example to rigorously analyze EoS phenomena, explaining the sharpness behavior and bifurcating dynamics observed in neural network training.
Findings
Sharpness at convergence is close to the stability threshold 2/η.
Training dynamics exhibit bifurcating behavior similar to neural networks.
The simple example captures key EoS phenomena observed in practice.
Abstract
Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold (where is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below . While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and . In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the final converging point has sharpness close to . Globally we observe that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Advanced Memory and Neural Computing
