Validity-Calibrated Reasoning Distillation
Khouloud Saadi, Di Wang

TL;DR
This paper introduces a novel reasoning distillation framework that emphasizes local validity and dynamic supervision, improving the transfer of reasoning skills from large to smaller language models.
Contribution
It proposes a validity-calibrated approach that moves away from static trajectory imitation, enabling more effective and context-aware reasoning distillation.
Findings
Outperforms strong baselines on mathematical reasoning, code generation, and instruction-following tasks.
Uses local validity to adapt supervision strength, improving reasoning quality.
Demonstrates that flexible, locally calibrated learning signals enhance distillation effectiveness.
Abstract
Reasoning distillation aims to transfer multi-step reasoning capabilities from large language models to smaller, more efficient ones. While recent methods have shown promising gains, they typically rely on static teacher-student hierarchies and frame distillation as trajectory imitation. This is misaligned with the structure of reasoning, where intermediate steps are often locally under-specified: global correctness constrains the final answer, but does not uniquely determine each intermediate move. We propose validity-calibrated reasoning distillation, a framework that treats reasoning distillation as a problem of local learning-signal allocation rather than path alignment. Instead of enforcing token-level imitation, we compare the student's and teacher's proposed next-step actions under the same prefix and use their relative local validity to modulate the strength of the distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
