AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs
Anshul Kumar, Gagan Raj Gupta, Manisha Chawla

TL;DR
AdaGradSelect is an adaptive layer selection method that improves fine-tuning efficiency of small language models by selectively updating transformer blocks based on gradient importance, reducing training time and memory usage while maintaining high performance.
Contribution
It introduces a novel adaptive block selection approach using gradient norms and Dirichlet sampling, outperforming existing PEFT methods like LoRA in efficiency and accuracy.
Findings
Trains 12% faster with 35% less GPU memory.
Outperforms LoRA by 3% on GSM8K dataset.
Achieves similar accuracy on MATH dataset.
Abstract
Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA reduce this cost by adding small low-rank updates to frozen model weights. However, these methods restrict the training to a limited subspace, which can sometimes reduce performance. For Small Language Models (SLMs), where efficiency gains matter even more, we introduce AdaGradSelect, an adaptive method that selects which transformer blocks to update based on gradients. Early observations showed that updating only the transformer blocks with the highest gradient norms can achieve performance close to full fine-tuning. Building on this insight, AdaGradSelect adaptively chooses which blocks to train. It uses a combination of Dirichlet-based sampling, which depends on how frequently blocks were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Computational and Text Analysis Methods
