CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
Shaobo Wang, Yongliang Miao, Yuancheng Liu, Qianli Ma, Ning Liao, Linfeng Zhang

TL;DR
CircuitSeer is a novel data selection approach that identifies and leverages core reasoning circuits within LLMs to efficiently select high-quality training data, improving reasoning performance with less data.
Contribution
This work introduces CircuitSeer, a new method that uses internal model circuits to select data, reducing reliance on external heuristics and costly models.
Findings
Selective data training improves reasoning accuracy.
CircuitSeer outperforms existing data selection methods.
High-quality reasoning data can be identified via internal model analysis.
Abstract
Large language models (LLMs) have demonstrated impressive reasoning capabilities, but scaling their performance often relies on massive reasoning datasets that are computationally expensive to train on. Existing data selection methods aim to curate smaller, high-quality subsets but often rely on costly external models or opaque heuristics. In this work, we shift the focus from external heuristics to the model's internal mechanisms. We find that complex reasoning tasks consistently activate a sparse, specialized subset of attention heads, forming core reasoning circuits. Building on this insight, we propose CircuitSeer, a novel data selection method that quantifies the reasoning complexity of data by measuring its influence on these crucial circuits. Extensive experiments on 4 models and 9 datasets demonstrate CircuitSeer's superiority. Notably, fine-tuning Qwen2.5-Math-7B on just 10% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Machine Learning in Materials Science
