Not All Correct Answers Are Equal: Why Your Distillation Source Matters

Xiaoyu Tian; Yunjie Ji; Haotian Wang; Shuaiting Chen; Sitong Zhao; Yiping Peng; Han Zhao; Xiangang Li

arXiv:2505.14464·cs.CL·May 23, 2025

Not All Correct Answers Are Equal: Why Your Distillation Source Matters

Xiaoyu Tian, Yunjie Ji, Haotian Wang, Shuaiting Chen, Sitong Zhao, Yiping Peng, Han Zhao, Xiangang Li

PDF

Open Access 5 Datasets

TL;DR

This paper conducts a large-scale empirical study on reasoning data distillation from multiple teacher models, revealing that high-quality, verified reasoning traces significantly improve the reasoning capabilities of student language models.

Contribution

It introduces and analyzes three parallel reasoning datasets distilled from state-of-the-art models, highlighting the superior performance of data from AM-Thinking-v1 and releasing these datasets publicly.

Findings

01

AM-Thinking-v1-distilled data shows greater token length diversity.

02

Models trained on AM-Thinking-v1 data outperform others on reasoning benchmarks.

03

Distilled datasets improve reasoning performance and output adaptability.

Abstract

Distillation has emerged as a practical and effective approach to enhance the reasoning capabilities of open-source language models. In this work, we conduct a large-scale empirical study on reasoning data distillation by collecting verified outputs from three state-of-the-art teacher models-AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1-on a shared corpus of 1.89 million queries. We construct three parallel datasets and analyze their distributions, revealing that AM-Thinking-v1-distilled data exhibits greater token length diversity and lower perplexity. Student models trained on each dataset are evaluated on reasoning benchmarks including AIME2024, AIME2025, MATH500, and LiveCodeBench. The model distilled from AM-Thinking-v1 consistently achieves the best performance (e.g., 84.3 on AIME2024, 72.2 on AIME2025, 98.4 on MATH500, and 65.9 on LiveCodeBench) and demonstrates adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks