Practical Sharpness-Aware Minimization Cannot Converge All the Way to   Optima

Dongkuk Si; Chulhee Yun

arXiv:2306.09850·cs.LG·October 30, 2023·1 cites

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima

Dongkuk Si, Chulhee Yun

PDF

Open Access 1 Video

TL;DR

This paper investigates the convergence properties of practical Sharpness-Aware Minimization (SAM) algorithms with constant perturbation and gradient normalization, revealing limitations in reaching global optima and highlighting differences from theoretical assumptions.

Contribution

It provides the first analysis of practical SAM configurations, demonstrating their limited convergence to optima and the unavoidable neighborhood bounds caused by fixed perturbation size.

Findings

01

Deterministic SAM achieves rac{1}{T^2} convergence rate for strongly convex functions.

02

Stochastic SAM converges only up to an O( ho^2) neighborhood of the optimum.

03

Examples show the O( ho^2) bounds are unavoidable in practical SAM settings.

Abstract

Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent step based on the gradient at a perturbation $y_{t} = x_{t} + ρ \frac{\nabla f ( x _{t} )}{∥ \nabla f ( x _{t} )∥}$ of the current point $x_{t}$ . Existing studies prove convergence of SAM for smooth functions, but they do so by assuming decaying perturbation size $ρ$ and/or no gradient normalization in $y_{t}$ , which is detached from practice. To address this gap, we study deterministic/stochastic versions of SAM with practical configurations (i.e., constant $ρ$ and gradient normalization in $y_{t}$ ) and explore their convergence properties on smooth functions with (non)convexity assumptions. Perhaps surprisingly, in many scenarios, we find out that SAM has limited capability to converge to global minima or stationary points. For smooth strongly convex functions, we show that while deterministic SAM enjoys tight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsSegment Anything Model · Gradient Normalization