GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection
Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, and Penglei Gao

TL;DR
GAST introduces a unified sparse tuning method that adaptively selects impactful data points and model layers, improving parameter-efficient fine-tuning of large language models beyond existing layer- or data-only approaches.
Contribution
It proposes a novel unified approach for sparse fine-tuning that considers both data and layer selection simultaneously, addressing limitations of prior methods.
Findings
GAST outperforms baseline methods in experiments.
It effectively identifies impactful data points for each layer.
The method enhances fine-tuning efficiency and model performance.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) has become a key strategy for adapting large language models, with recent advances in sparse tuning reducing overhead by selectively updating key parameters or subsets of data. Existing approaches generally focus on two distinct paradigms: layer-selective methods aiming to fine-tune critical layers to minimize computational load, and data-selective methods aiming to select effective training subsets to boost training. However, current methods typically overlook the fact that different data points contribute varying degrees to distinct model layers, and they often discard potentially valuable information from data perceived as of low quality. To address these limitations, we propose Gradient-aligned Sparse Tuning (GAST), an innovative method that simultaneously performs selective fine-tuning at both data and layer dimensions as integral components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
