From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models

Ling Shi; Xinwei Wu; Xiaohu Zhao; Hao Wang; Heng Liu; Yangyang Liu; Linlong Xu; Longyue Wang; Deyi Xiong; Weihua Luo

arXiv:2604.25167·cs.AI·April 29, 2026

From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models

Ling Shi, Xinwei Wu, Xiaohu Zhao, Hao Wang, Heng Liu, Yangyang Liu, Linlong Xu, Longyue Wang, Deyi Xiong, Weihua Luo

PDF

TL;DR

This paper introduces IGDS, a framework that uses interpretability tools to select data that maximally activates internal task features, improving model fine-tuning efficiency across multiple tasks and models.

Contribution

The paper presents a novel interpretability-guided data selection method that enhances LLM training by leveraging internal feature activation, demonstrating significant data efficiency gains.

Findings

01

IGDS outperforms full-dataset fine-tuning with only 50% data on Math tasks.

02

On Math tasks, IGDS surpasses baseline performance by 17.4% on Gemma-2-2B.

03

Strong correlation between feature amplification and task performance improvement.

Abstract

While mechanistic interpretability tools like Sparse Autoencoders (SAEs) can uncover meaningful features within Large Language Models (LLMs), a critical gap remains in transforming these insights into practical actions for model optimization. We bridge this gap with the hypothesis that data selection guided by a model's internal task features is a effective training strategy. Inspired by this, we propose Interpretability-Guided Data Selection (IGDS), a framework that first identifies these causal task features through frequency recall and interventional filtering, then selects ``Feature-Resonant Data'' that maximally activates task features for fine-tuning. We validate IGDS on mathematical reasoning, summarization, and translation tasks within Gemma-2, LLaMA-3.1, and Qwen3 models. Our experiments demonstrate exceptional data efficiency: on the Math task, IGDS surpasses full-dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.