Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning

Yifei Cheng; Xianglin Yang; Guoxia Wang; Chao Huang; Fei Ma; Dianhai Yu; Xiaochun Cao; Li Shen

arXiv:2602.09395·cs.LG·February 11, 2026

Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning

Yifei Cheng, Xianglin Yang, Guoxia Wang, Chao Huang, Fei Ma, Dianhai Yu, Xiaochun Cao, Li Shen

PDF

Open Access

TL;DR

This paper introduces SL-SAM, a sparse layer approach to sharpness-aware minimization that reduces computational costs in fine-tuning large models while maintaining high performance, by dynamically selecting layers via a multi-armed bandit strategy.

Contribution

SL-SAM innovatively applies sparsity and multi-armed bandit algorithms to optimize layer selection in SAM, significantly reducing computation during fine-tuning without sacrificing accuracy.

Findings

01

SL-SAM achieves comparable or better performance than state-of-the-art methods.

02

SL-SAM activates significantly fewer parameters during backpropagation.

03

SL-SAM attains the top rank in LLM fine-tuning tasks.

Abstract

Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra parameter perturbation step doubles the computation cost, which becomes the bottleneck of SAM in the practical implementation. In this work, we propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers. Our key innovation is to frame the dynamic selection of layers for both the gradient ascent (perturbation) and descent (update) steps as a multi-armed bandit problem. At the beginning of each iteration, SL-SAM samples a part of the layers of the model according to the gradient norm to participate in the backpropagation of the following parameter perturbation and update steps, thereby reducing the computation complexity. We then provide the analysis to guarantee…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis