SGD at the Edge of Stability: The Stochastic Sharpness Gap

Fangshuo Liao; Afroditi Kolomvaki; Anastasios Kyrillidis

arXiv:2604.21016·cs.LG·April 24, 2026

SGD at the Edge of Stability: The Stochastic Sharpness Gap

Fangshuo Liao, Afroditi Kolomvaki, Anastasios Kyrillidis

PDF

TL;DR

This paper extends the understanding of the Edge of Stability phenomenon in neural network training by analyzing how stochastic gradient noise influences sharpness, leading to a predictable gap below the theoretical maximum.

Contribution

It introduces stochastic self-stabilization, providing a theoretical framework and a closed-form formula for the sharpness gap in SGD, explaining the effects of batch size on solution sharpness.

Findings

01

SGD stabilizes sharpness below 2/η due to gradient noise.

02

Derived a formula predicting the sharpness gap based on noise and training parameters.

03

Smaller batch sizes lead to flatter solutions, matching empirical observations.

Abstract

When training neural networks with full-batch gradient descent (GD) and step size $η$ , the largest eigenvalue of the Hessian -- the sharpness $S (θ)$ -- rises to $2/ η$ and hovers there, a phenomenon termed the Edge of Stability (EoS). \citet{damian2023selfstab} showed that this behavior is explained by a self-stabilization mechanism driven by third-order structure of the loss, and that GD implicitly follows projected gradient descent (PGD) on the constraint $S (θ) \leq 2/ η$ . For mini-batch stochastic gradient descent (SGD), the sharpness stabilizes below $2/ η$ , with the gap widening as the batch size decreases; yet no theoretical explanation exists for this suppression. We introduce stochastic self-stabilization, extending the self-stabilization framework to SGD. Our key insight is that gradient noise injects variance into the oscillatory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.