Validity-Calibrated Reasoning Distillation

Khouloud Saadi; Di Wang

arXiv:2605.04078·cs.LG·May 12, 2026

Validity-Calibrated Reasoning Distillation

Khouloud Saadi, Di Wang

PDF

TL;DR

This paper introduces a novel reasoning distillation framework that emphasizes local validity and dynamic supervision, improving the transfer of reasoning skills from large to smaller language models.

Contribution

It proposes a validity-calibrated approach that moves away from static trajectory imitation, enabling more effective and context-aware reasoning distillation.

Findings

01

Outperforms strong baselines on mathematical reasoning, code generation, and instruction-following tasks.

02

Uses local validity to adapt supervision strength, improving reasoning quality.

03

Demonstrates that flexible, locally calibrated learning signals enhance distillation effectiveness.

Abstract

Reasoning distillation aims to transfer multi-step reasoning capabilities from large language models to smaller, more efficient ones. While recent methods have shown promising gains, they typically rely on static teacher-student hierarchies and frame distillation as trajectory imitation. This is misaligned with the structure of reasoning, where intermediate steps are often locally under-specified: global correctness constrains the final answer, but does not uniquely determine each intermediate move. We propose validity-calibrated reasoning distillation, a framework that treats reasoning distillation as a problem of local learning-signal allocation rather than path alignment. Instead of enforcing token-level imitation, we compare the student's and teacher's proposed next-step actions under the same prefix and use their relative local validity to modulate the strength of the distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.