mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Kayhan Behdin; Qingquan Song; Aman Gupta; Sathiya Keerthi; Ayan; Acharya; Borja Ocejo; Gregory Dexter; Rajiv Khanna; David Durfee; Rahul; Mazumder

arXiv:2302.09693·stat.ML·October 3, 2023·1 cites

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan, Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul, Mazumder

PDF

Open Access

TL;DR

This paper introduces mSAM, a variant of Sharpness-Aware Minimization that aggregates adversarial perturbations across micro-batches, achieving flatter minima and better generalization in deep learning models.

Contribution

It provides a theoretical analysis showing mSAM finds flatter minima than SAM and SGD, and demonstrates its practical effectiveness and computational efficiency across tasks.

Findings

01

mSAM achieves flatter minima than SAM and SGD.

02

mSAM improves generalization performance across various tasks.

03

mSAM can be implemented efficiently without significant computational overhead.

Abstract

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsStochastic Gradient Descent · Sharpness-Aware Minimization