
TL;DR
The paper explains why full-batch gradient descent on neural networks drives the Hessian eigenvalue to the stability threshold, revealing a precise mechanism called edge coupling that enforces this behavior.
Contribution
It introduces the concept of edge coupling, providing a mathematical framework that explains the Edge of Stability phenomenon in neural network training.
Findings
Hessian eigenvalues are driven to the threshold 2/η during training.
Edge coupling classifies fixed points and period-two orbits in the training dynamics.
The analysis reveals conditions under which the eigenvalue approaches the stability boundary.
Abstract
Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold , where is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary , and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward . The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
