HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation
Wenjing Zhang, Jiangze Yan, Jieyun Huang, Yi Shen, Shuming Shi, Ping Chen, Ning Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian

TL;DR
HEAL is a novel framework that enhances reasoning distillation from large models to smaller ones by actively repairing reasoning trajectories and filtering genuine breakthroughs, leading to superior performance.
Contribution
HEAL introduces a new RL-free distillation framework with three modules that improve reasoning transfer by active intervention, filtering, and curriculum evolution.
Findings
HEAL outperforms traditional SFT distillation on multiple benchmarks.
HEAL effectively repairs reasoning trajectories using entropy dynamics.
HEAL's curriculum strategy accelerates reasoning skill transfer.
Abstract
Distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models is typically constrained by the limitation of rejection sampling. Standard methods treat the teacher as a static filter, discarding complex "corner-case" problems where the teacher fails to explore valid solutions independently, thereby creating an artificial "Teacher Ceiling" for the student. In this work, we propose Hindsight Entropy-Assisted Learning (HEAL), an RL-free framework designed to bridge this reasoning gap. Drawing on the educational theory of the Zone of Proximal Development(ZPD), HEAL synergizes three core modules: (1) Guided Entropy-Assisted Repair (GEAR), an active intervention mechanism that detects critical reasoning breakpoints via entropy dynamics and injects targeted hindsight hints to repair broken trajectories; (2) Perplexity-Uncertainty Ratio Estimator (PURE), a rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Child and Animal Learning Development · Innovative Teaching and Learning Methods
