From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
Ling Shi, Xinwei Wu, Xiaohu Zhao, Hao Wang, Heng Liu, Yangyang Liu, Linlong Xu, Longyue Wang, Deyi Xiong, Weihua Luo

TL;DR
This paper introduces IGDS, a framework that uses interpretability tools to select data that maximally activates internal task features, improving model fine-tuning efficiency across multiple tasks and models.
Contribution
The paper presents a novel interpretability-guided data selection method that enhances LLM training by leveraging internal feature activation, demonstrating significant data efficiency gains.
Findings
IGDS outperforms full-dataset fine-tuning with only 50% data on Math tasks.
On Math tasks, IGDS surpasses baseline performance by 17.4% on Gemma-2-2B.
Strong correlation between feature amplification and task performance improvement.
Abstract
While mechanistic interpretability tools like Sparse Autoencoders (SAEs) can uncover meaningful features within Large Language Models (LLMs), a critical gap remains in transforming these insights into practical actions for model optimization. We bridge this gap with the hypothesis that data selection guided by a model's internal task features is a effective training strategy. Inspired by this, we propose Interpretability-Guided Data Selection (IGDS), a framework that first identifies these causal task features through frequency recall and interventional filtering, then selects ``Feature-Resonant Data'' that maximally activates task features for fine-tuning. We validate IGDS on mathematical reasoning, summarization, and translation tasks within Gemma-2, LLaMA-3.1, and Qwen3 models. Our experiments demonstrate exceptional data efficiency: on the Math task, IGDS surpasses full-dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
