SAMPa: Sharpness-aware Minimization Parallelized
Wanyun Xie, Thomas Pethick, Volkan Cevher

TL;DR
SAMPa is a parallelized version of sharpness-aware minimization that doubles training speed and maintains convergence guarantees, outperforming standard SAM in efficiency and effectiveness across vision and language tasks.
Contribution
We introduce SAMPa, a parallelized variant of SAM that significantly accelerates training while preserving theoretical convergence guarantees.
Findings
SAMPa achieves a twofold speedup over SAM.
SAMPa outperforms SAM in various vision and language tasks.
SAMPa maintains convergence guarantees with fixed perturbation sizes.
Abstract
Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, each SAM update requires \emph{sequentially} computing two gradients, effectively doubling the per-iteration cost compared to base optimizers like SGD. We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations. SAMPa achieves a twofold speedup of SAM under the assumption that communication costs between devices are negligible. Empirical results show that SAMPa ranks among the most efficient variants of SAM in terms of computational time. Additionally, our method consistently outperforms SAM across both vision and language tasks. Notably, SAMPa theoretically maintains convergence guarantees even for \emph{fixed} perturbation sizes, which is established through a novel Lyapunov function. We in fact arrive at SAMPa by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification
MethodsStochastic Gradient Descent · Balanced Selection · Segment Anything Model
