Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen,, Rick Siow Mong Goh, Vincent Y. F. Tan

TL;DR
This paper introduces ESAM, an efficient variant of SAM, which maintains generalization benefits while reducing computational costs by approximately 60% through novel training strategies.
Contribution
ESAM proposes two efficient training strategies, StochasticWeight Perturbation and Sharpness-Sensitive Data Selection, to significantly reduce SAM's computational cost without sacrificing performance.
Findings
ESAM reduces computational cost from 100% to 40% compared to SAM.
ESAM maintains or improves test accuracy on CIFAR and ImageNet.
Theoretical analysis supports the effectiveness of the proposed strategies.
Abstract
Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsTest · Attentive Walk-Aggregating Graph Neural Network
