CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs

Shaobo Wang; Yongliang Miao; Yuancheng Liu; Qianli Ma; Ning Liao; Linfeng Zhang

arXiv:2510.18470·cs.AI·October 24, 2025

CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs

Shaobo Wang, Yongliang Miao, Yuancheng Liu, Qianli Ma, Ning Liao, Linfeng Zhang

PDF

Open Access

TL;DR

CircuitSeer is a novel data selection approach that identifies and leverages core reasoning circuits within LLMs to efficiently select high-quality training data, improving reasoning performance with less data.

Contribution

This work introduces CircuitSeer, a new method that uses internal model circuits to select data, reducing reliance on external heuristics and costly models.

Findings

01

Selective data training improves reasoning accuracy.

02

CircuitSeer outperforms existing data selection methods.

03

High-quality reasoning data can be identified via internal model analysis.

Abstract

Large language models (LLMs) have demonstrated impressive reasoning capabilities, but scaling their performance often relies on massive reasoning datasets that are computationally expensive to train on. Existing data selection methods aim to curate smaller, high-quality subsets but often rely on costly external models or opaque heuristics. In this work, we shift the focus from external heuristics to the model's internal mechanisms. We find that complex reasoning tasks consistently activate a sparse, specialized subset of attention heads, forming core reasoning circuits. Building on this insight, we propose CircuitSeer, a novel data selection method that quantifies the reasoning complexity of data by measuring its influence on these crucial circuits. Extensive experiments on 4 models and 9 datasets demonstrate CircuitSeer's superiority. Notably, fine-tuning Qwen2.5-Math-7B on just 10% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Machine Learning in Materials Science