Sharpness-Aware Minimization and the Edge of Stability
Philip M. Long, Peter L. Bartlett

TL;DR
This paper analyzes the 'edge of stability' phenomenon in neural network training, extending it to Sharpness-Aware Minimization (SAM) and showing that SAM operates at this stability boundary influenced by the gradient norm.
Contribution
The paper derives a new 'edge of stability' condition for SAM, revealing its dependence on the gradient norm, and empirically confirms SAM's operation at this stability boundary.
Findings
SAM's edge of stability depends on the gradient norm.
Empirical evidence shows SAM operates at the derived stability boundary.
SAM improves generalization by operating at this stability edge.
Abstract
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size , the operator norm of the Hessian of the loss grows until it approximately reaches , after which it fluctuates around this value. The quantity has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
MethodsSegment Anything Model · Sharpness-Aware Minimization
