Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM

Yibai Liu; Shihang Wang; Zeming Liu; Zheming Song; Junzhe Wang; Jingjing Liu; Qingjie Liu; Yunhong Wang

arXiv:2511.08620·cs.CL·November 13, 2025

Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM

Yibai Liu, Shihang Wang, Zeming Liu, Zheming Song, Junzhe Wang, Jingjing Liu, Qingjie Liu, Yunhong Wang

PDF

Open Access

TL;DR

This paper introduces GrADS, a gradient-aware data selection method for fine-tuning large language models that improves efficiency and reduces catastrophic forgetting by selecting the most impactful training examples based on gradient analysis.

Contribution

The paper presents a novel self-adaptive gradient-aware data selection approach (GrADS) that enhances domain-specific fine-tuning of LLMs by identifying effective training data through gradient analysis.

Findings

01

Using only 5% of selected data surpasses full dataset performance.

02

Increasing data to 50% yields significant improvements.

03

Method reduces resource consumption and mitigates catastrophic forgetting.

Abstract

Despite large language models (LLMs) have achieved impressive achievements across numerous tasks, supervised fine-tuning (SFT) remains essential for adapting these models to specialized domains. However, SFT for domain specialization can be resource-intensive and sometimes leads to a deterioration in performance over general capabilities due to catastrophic forgetting (CF). To address these issues, we propose a self-adaptive gradient-aware data selection approach (GrADS) for supervised fine-tuning of LLMs, which identifies effective subsets of training data by analyzing gradients obtained from a preliminary training phase. Specifically, we design self-guided criteria that leverage the magnitude and statistical distribution of gradients to prioritize examples that contribute the most to the model's learning process. This approach enables the acquisition of representative samples that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Domain Adaptation and Few-Shot Learning