Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko, Dara Bahri, Hossein Mobahi, Nicolas Flammarion

TL;DR
This paper reveals that Sharpness-Aware Minimization (SAM) not only improves generalization but also reduces feature rank across various neural network architectures and tasks, with a mechanistic explanation provided for simple models.
Contribution
It uncovers the low-rank feature reduction effect of SAM across diverse models and tasks, and offers a mechanistic understanding of this phenomenon.
Findings
SAM reduces feature rank in neural networks.
Low-rank effect occurs broadly across architectures and objectives.
Activation pruning by SAM contributes to rank reduction.
Abstract
Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
