Understanding Edge-of-Stability Training Dynamics with a Minimalist   Example

Xingyu Zhu; Zixuan Wang; Xiang Wang; Mo Zhou; Rong Ge

arXiv:2210.03294·cs.LG·February 22, 2023·1 cites

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

PDF

Open Access 1 Video

TL;DR

This paper investigates the edge-of-stability regime in neural network training by analyzing a simple, illustrative function, revealing why sharpness hovers near the stability threshold and exhibiting bifurcating dynamics similar to real neural networks.

Contribution

The paper introduces a minimal example to rigorously analyze EoS phenomena, explaining the sharpness behavior and bifurcating dynamics observed in neural network training.

Findings

01

Sharpness at convergence is close to the stability threshold 2/η.

02

Training dynamics exhibit bifurcating behavior similar to neural networks.

03

The simple example captures key EoS phenomena observed in practice.

Abstract

Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold $2/ η$ (where $η$ is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below $2/ η$ . While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and $2/ η$ . In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the final converging point has sharpness close to $2/ η$ . Globally we observe that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Advanced Memory and Neural Computing