On Statistical Properties of Sharpness-Aware Minimization: Provable   Guarantees

Kayhan Behdin; Rahul Mazumder

arXiv:2302.11836·stat.ML·May 22, 2023·1 cites

On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees

Kayhan Behdin, Rahul Mazumder

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of Sharpness-Aware Minimization (SAM), demonstrating its statistical advantages and flatter solutions in both convex and non-convex settings, supported by numerical experiments.

Contribution

It offers the first theoretical explanation of SAM's statistical properties and generalization benefits, especially in non-convex problems, complementing prior empirical findings.

Findings

01

SAM achieves smaller prediction error than Gradient Descent under certain conditions.

02

SAM solutions are less sharp, indicating flatter minima.

03

Numerical experiments validate theoretical predictions across various scenarios.

Abstract

Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework and have shown SAM solutions are indeed flat. However, there has been limited theoretical exploration regarding statistical properties of SAM. In this work, we directly study the statistical performance of SAM, and present a new theoretical explanation of why SAM generalizes well. To this end, we study two statistical problems, neural networks with a hidden layer and kernel regression, and prove under certain conditions, SAM has smaller prediction error over Gradient Descent (GD). Our results concern both convex and non-convex settings, and show that SAM is particularly well-suited for non-convex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM

MethodsLinear Regression