Avoiding spurious sharpness minimization broadens applicability of SAM
Sidak Pal Singh, Hossein Mobahi, Atish Agarwala, Yann Dauphin

TL;DR
This paper introduces Functional-SAM, a curvature regularization method that avoids spurious minimization by focusing on function statistics, improving generalization in both vision and NLP tasks, especially for large language models.
Contribution
We propose Functional-SAM, an alternative to SAM that targets curvature regularization through function statistics, enhancing applicability across domains including NLP and large language models.
Findings
Functional-SAM outperforms SAM and AdamW baselines in various training settings.
Preconditioning SAM perturbation further improves performance.
Our methods are effective at billion-parameter scale models.
Abstract
Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown great promise in improving generalization on vision tasks. However, we find that SAM performs poorly in domains like natural language processing (NLP), often degrading performance -- even with twice the compute budget. We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominated by regularization of the logit statistics -- instead of improving the geometry of the function itself. We use this observation to develop an alternative algorithm we call Functional-SAM, which regularizes curvature only through modification of the statistics of the overall function implemented by the neural network, and avoids spurious minimization through logit manipulation. Furthermore, we argue that preconditioning the SAM perturbation also prevents spurious minimization, and when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsUltrasonics and Acoustic Wave Propagation
MethodsAttentive Walk-Aggregating Graph Neural Network · AdamW · Segment Anything Model
