AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs

Anshul Kumar; Gagan Raj Gupta; Manisha Chawla

arXiv:2512.15764·cs.LG·December 19, 2025

AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs

Anshul Kumar, Gagan Raj Gupta, Manisha Chawla

PDF

Open Access

TL;DR

AdaGradSelect is an adaptive layer selection method that improves fine-tuning efficiency of small language models by selectively updating transformer blocks based on gradient importance, reducing training time and memory usage while maintaining high performance.

Contribution

It introduces a novel adaptive block selection approach using gradient norms and Dirichlet sampling, outperforming existing PEFT methods like LoRA in efficiency and accuracy.

Findings

01

Trains 12% faster with 35% less GPU memory.

02

Outperforms LoRA by 3% on GSM8K dataset.

03

Achieves similar accuracy on MATH dataset.

Abstract

Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA reduce this cost by adding small low-rank updates to frozen model weights. However, these methods restrict the training to a limited subspace, which can sometimes reduce performance. For Small Language Models (SLMs), where efficiency gains matter even more, we introduce AdaGradSelect, an adaptive method that selects which transformer blocks to update based on gradients. Early observations showed that updating only the transformer blocks with the highest gradient norms can achieve performance close to full fine-tuning. Building on this insight, AdaGradSelect adaptively chooses which blocks to train. It uses a combination of Dirichlet-based sampling, which depends on how frequently blocks were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Computational and Text Analysis Methods