GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang, Ruixuan Luo, Qi Su, Xu Sun

TL;DR
This paper introduces GA-SAM, an adaptive optimization method that enhances model generalization by encouraging convergence to flat minima, especially effective in natural language tasks with significant gradient variations.
Contribution
The paper presents a novel theoretical perspective linking flat minima to generalization and proposes GA-SAM, an adaptive algorithm that improves flatness and generalization in language models.
Findings
GA-SAM outperforms standard SAM in language benchmarks.
Flat minima correlate with better generalization in NLP tasks.
GA-SAM effectively handles models with drastic gradient changes.
Abstract
Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsTest · Sharpness-Aware Minimization
