Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si, Chulhee Yun

TL;DR
This paper investigates the convergence properties of practical Sharpness-Aware Minimization (SAM) algorithms with constant perturbation and gradient normalization, revealing limitations in reaching global optima and highlighting differences from theoretical assumptions.
Contribution
It provides the first analysis of practical SAM configurations, demonstrating their limited convergence to optima and the unavoidable neighborhood bounds caused by fixed perturbation size.
Findings
Deterministic SAM achieves rac{1}{T^2} convergence rate for strongly convex functions.
Stochastic SAM converges only up to an O( ho^2) neighborhood of the optimum.
Examples show the O( ho^2) bounds are unavoidable in practical SAM settings.
Abstract
Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent step based on the gradient at a perturbation of the current point . Existing studies prove convergence of SAM for smooth functions, but they do so by assuming decaying perturbation size and/or no gradient normalization in , which is detached from practice. To address this gap, we study deterministic/stochastic versions of SAM with practical configurations (i.e., constant and gradient normalization in ) and explore their convergence properties on smooth functions with (non)convexity assumptions. Perhaps surprisingly, in many scenarios, we find out that SAM has limited capability to converge to global minima or stationary points. For smooth strongly convex functions, we show that while deterministic SAM enjoys tight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsSegment Anything Model · Gradient Normalization
