SAMPa: Sharpness-aware Minimization Parallelized

Wanyun Xie; Thomas Pethick; Volkan Cevher

arXiv:2410.10683·cs.LG·October 15, 2024

SAMPa: Sharpness-aware Minimization Parallelized

Wanyun Xie, Thomas Pethick, Volkan Cevher

PDF

Open Access 1 Repo

TL;DR

SAMPa is a parallelized version of sharpness-aware minimization that doubles training speed and maintains convergence guarantees, outperforming standard SAM in efficiency and effectiveness across vision and language tasks.

Contribution

We introduce SAMPa, a parallelized variant of SAM that significantly accelerates training while preserving theoretical convergence guarantees.

Findings

01

SAMPa achieves a twofold speedup over SAM.

02

SAMPa outperforms SAM in various vision and language tasks.

03

SAMPa maintains convergence guarantees with fixed perturbation sizes.

Abstract

Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, each SAM update requires \emph{sequentially} computing two gradients, effectively doubling the per-iteration cost compared to base optimizers like SGD. We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations. SAMPa achieves a twofold speedup of SAM under the assumption that communication costs between devices are negligible. Empirical results show that SAMPa ranks among the most efficient variants of SAM in terms of computational time. Additionally, our method consistently outperforms SAM across both vision and language tasks. Notably, SAMPa theoretically maintains convergence guarantees even for \emph{fixed} perturbation sizes, which is established through a novel Lyapunov function. We in fact arrive at SAMPa by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lions-epfl/sampa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification

MethodsStochastic Gradient Descent · Balanced Selection · Segment Anything Model