Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Marlon Becker; Frederick Altrock; Benjamin Risse

arXiv:2401.12033·cs.LG·October 3, 2025·1 cites

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Marlon Becker, Frederick Altrock, Benjamin Risse

PDF

Open Access 1 Repo 1 Video

TL;DR

Momentum-SAM introduces a new optimization method that achieves sharpness-aware minimization without additional computational costs, improving deep neural network training and generalization.

Contribution

The paper proposes Momentum-SAM, a novel variant of SAM that reduces computational overhead by using momentum-based parameter perturbation, enabling efficient sharpness-aware optimization.

Findings

01

MSAM matches SAM's generalization benefits

02

MSAM requires similar training time as SGD/Adam

03

MSAM effectively reduces overfitting

Abstract

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marlonbecker/msam
pytorchOfficial

Videos

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsSegment Anything Model · Nesterov Accelerated Gradient · Attentive Walk-Aggregating Graph Neural Network · Adam · Stochastic Gradient Descent