Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Kaiyuan Liu, Shaotian Yan, Rui Miao, Bing Wang, Chen Shen, Jun Zhang, Jieping Ye

TL;DR
This paper introduces a provenance tracing framework to analyze how reasoning distillation transfers capabilities from teacher to student models, revealing that students can generate teacher-originated actions in test contexts and proposing a principled data selection method.
Contribution
It presents a novel provenance tracing framework for reasoning distillation and a teacher-guided data selection method that improves model training.
Findings
Students can generate teacher-originated actions in test contexts.
The provenance framework effectively disentangles the origins of model actions.
The data selection method outperforms heuristic approaches.
Abstract
Reasoning distillation has attracted increasing attention. It typically leverages a large teacher model to generate reasoning paths, which are then used to fine-tune a student model so that it mimics the teacher's behavior in training contexts. However, previous approaches have lacked a detailed analysis of the origins of the distilled model's capabilities. It remains unclear whether the student can maintain consistent behaviors with the teacher in novel test-time contexts, or whether it regresses to its original output patterns, raising concerns about the generalization of distillation models. To analyse this question, we introduce a cross-model Reasoning Distillation Provenance Tracing framework. For each action (e.g., a sentence) produced by the distilled model, we obtain the predictive probabilities assigned by the teacher, the original student, and the distilled model under the…
Peer Reviews
Decision·ICLR 2026 Poster
- A timely, yet interesting topic - Well written - Several qualitatively interesting experimental results, and promising results for the newly proposed data selection method
- Although I'm not an expert in this field, it is still clear that the paper is lacking discussions of prior literature on knowledge distillation and model auditing/provenance [1,2]. Additionally, core literature on distilling LLM reasoning capabilities [3,4] is lacking. - Can the proposed provenance methodology be used to identify which teacher model distilled the reasoning capability? [1] https://dl.acm.org/doi/10.1145/3292500.3330885 [2] https://openreview.net/forum?id=TatRHT_1cK [3] http
- The paper is well organized and good written - RDPT introduces an interpretable framework for tracing the origin of reasoning steps, bridges the gap between explainability and distillation efficiency. - Proposes a simple yet effective teacher-guided data selection method.
- Lack of validation on larger-scale (e.g., 70B+) models questions scalability - The proposed teacher-guided selection requires re-feeding large corpora through both models for probability extraction, cost analysis is needed - The method is primarily validated on short- to medium-length reasoning traces, leaving its effectiveness in long-range or multi-step reasoning scenarios uncertain.
- Provides a novel analytical framework (RDPT) that quantitatively traces the provenance of reasoning actions, offering interpretability to the distillation process. - Empirical evidence that distilled models reproduce teacher-originated behaviors in unseen contexts, explaining why reasoning distillation generalizes. - The proposed teacher-guided data selection is simple yet principled, improving performance across diverse teacher–student configurations and datasets. - Strong experimental design
- The proposed framework (RDPT) is primarily analytical and diagnostic rather than methodological. It provides interpretation of distillation outcomes but does not introduce new mechanisms that improve reasoning capability. - The provenance classification relies on manually set thresholds (α, β) and simple probability gaps between teacher and student. This rule-based design is heuristic and potentially unstable across datasets or model scales. - The definition of “teacher-originated” actions is
Code & Models
- 🤗Alibaba-Apsara/DASD-4B-Thinkingmodel· 462 dl· ♡ 216462 dl♡ 216
- 🤗Alibaba-Apsara/DASD-30B-A3B-Thinking-Previewmodel· 146 dl· ♡ 52146 dl♡ 52
- 🤗cyankiwi/DASD-30B-A3B-Thinking-Preview-AWQ-4bitmodel· 2 dl2 dl
- 🤗cyankiwi/DASD-30B-A3B-Thinking-Preview-AWQ-8bitmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗cyankiwi/DASD-4B-Thinking-AWQ-4bitmodel· 16 dl· ♡ 116 dl♡ 1
- 🤗cyankiwi/DASD-4B-Thinking-AWQ-8bitmodel· 1 dl1 dl
- 🤗aashish1904/DASD-4B-Thinking-GGUFmodel· 57 dl· ♡ 357 dl♡ 3
- 🤗Mungert/DASD-4B-Thinking-GGUFmodel· 107 dl107 dl
- Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprobdataset· 1.3k dl1.3k dl
- Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120bdataset· 2.5k dl2.5k dl
- Vanguminh69/Superior-Reasoning-SFT-gpt-oss-120bdataset· 3 dl3 dl
- prabinh/Superior-Reasoning-SFT-gpt-oss-120bdataset· 68 dl68 dl
- rico2512/Superior-Reasoning-SFT-gpt-oss-120bdataset· 44 dl44 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
