GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

Kaiyuan Tian; Yu Tang; Gongqingjian Jiang; Baihui Liu; Yifu Gao; Xialin Su; Linbo Qiao; Dongsheng Li

arXiv:2604.07808·cs.CL·April 10, 2026

GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li

PDF

TL;DR

GRASS is a gradient-based adaptive importance sampling method that improves memory efficiency and performance in large language model fine-tuning by dynamically estimating layer importance.

Contribution

It introduces a task-aware, training-stage-aware importance metric and an adaptive sampling strategy for layer-wise fine-tuning, outperforming existing methods.

Findings

01

Achieves up to 4.38 point accuracy improvement.

02

Reduces memory usage by up to 19.97%.

03

Outperforms state-of-the-art methods across multiple benchmarks.

Abstract

Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. Layer-wise fine-tuning methods have emerged as an alternative, enabling memory-efficient training through static layer importance sampling strategies. However, these methods overlook variations in layer importance across tasks and training stages, resulting in suboptimal performance on downstream tasks. To address these limitations, we propose GRASS, a gradient-based adaptive layer-wise importance sampling framework. GRASS utilizes mean gradient norms as a task-aware and training-stage-aware metric for estimating layer importance. Furthermore, GRASS adaptively adjusts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.