Structural Rationale Distillation via Reasoning Space Compression
Jialin Yang, Jiankun Wang, Jiajun Wu, Henry Leung, Jiayu Zhou, Steve Drew

TL;DR
This paper introduces D-RPC, a reasoning path compression method that improves reasoning distillation from large to small language models by using a compact, reusable reasoning path bank, leading to more consistent and effective rationales.
Contribution
The paper proposes a novel reasoning path compression technique for distillation that balances coverage and supervision entropy, backed by PAC-Bayes analysis and extensive empirical validation.
Findings
D-RPC outperforms existing distillation methods across multiple benchmarks.
Smaller reasoning path banks can achieve optimal generalization performance.
D-RPC produces more consistent and diverse rationales with fewer tokens.
Abstract
When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the student with noisy supervision that is hard to internalize. We propose Distillation through Reasoning Path Compression (D-RPC), which constrains the teacher to follow a compact, dynamically maintained bank of reusable high-level reasoning paths. For each training question, D-RPC retrieves the most relevant path and conditions the teacher to follow it, producing rationales that are consistent across similar problems yet diverse enough to cover different problem types. A PAC-Bayes analysis formalizes the resulting trade-off between bank size and coverage: smaller banks reduce supervision entropy but risk coverage gaps, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
