Reasoning Compression with Mixed-Policy Distillation

Han Yang; Mingyan Wu; Bailan He; Zeyu Cao; Sikuan Yan; Kevin Qinghong Lin; Zifeng Ding

arXiv:2605.08776·cs.AI·May 12, 2026

Reasoning Compression with Mixed-Policy Distillation

Han Yang, Mingyan Wu, Bailan He, Zeyu Cao, Sikuan Yan, Kevin Qinghong Lin, Zifeng Ding

PDF

TL;DR

This paper introduces Mixed-Policy Distillation (MPD), a novel framework that transfers concise reasoning behaviors from large models to smaller ones, reducing token usage and improving reasoning performance.

Contribution

The paper proposes MPD, a new distillation method that combines on-policy and off-policy approaches to effectively compress reasoning traces from large to small models.

Findings

01

MPD reduces token usage by up to 27.1%.

02

MPD improves reasoning benchmark performance.

03

MPD effectively transfers reasoning compression from large to small models.

Abstract

Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high inference-time decoding cost. We observe that, when solving the same problems, larger reasoning models can often produce more concise traces, whereas smaller reasoning models tend to generate longer and more redundant trajectories. This is especially problematic in real-world deployment, where memory, latency, and serving-cost constraints often favor smaller models. Our observations suggest that reasoning compression can be transferred from large models to small ones rather than enforced through explicit length constraints. Based on this insight, we propose Mixed-Policy Distillation (MPD), a reasoning compression framework that transfers concise reasoning behavior from a larger-sized teacher to a smaller student by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.