mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan, Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul, Mazumder

TL;DR
This paper introduces mSAM, a variant of Sharpness-Aware Minimization that aggregates adversarial perturbations across micro-batches, achieving flatter minima and better generalization in deep learning models.
Contribution
It provides a theoretical analysis showing mSAM finds flatter minima than SAM and SGD, and demonstrates its practical effectiveness and computational efficiency across tasks.
Findings
mSAM achieves flatter minima than SAM and SGD.
mSAM improves generalization performance across various tasks.
mSAM can be implemented efficiently without significant computational overhead.
Abstract
Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsStochastic Gradient Descent · Sharpness-Aware Minimization
