Distribution Corrected Offline Data Distillation for Large Language Models
Yumeng Zhang, Zhengbang Yang, Yevin Nikhel Goonatilake, and Zhuangdi Zhu

TL;DR
This paper introduces a new offline reasoning distillation framework for large language models that corrects distributional drift, leading to improved reasoning accuracy and more stable traces without requiring online sampling.
Contribution
It proposes a principled, distribution-correction-aware offline distillation method that enhances reasoning performance and stability in large language models.
Findings
Improves reasoning accuracy on GSM8K, MATH, MATH500, AMC, AIME, and OlympiadBench.
Produces more stable reasoning traces compared to prior offline algorithms.
Maintains instruction-following capabilities while strengthening reasoning.
Abstract
Distilling reasoning traces from strong large language models into smaller ones is a promising route to improve intelligence in resource-constrained settings. Existing approaches face a fundamental trade-off: offline distillation from teacher-generated traces provides high-quality, sample-efficient supervision but suffers from distributional drift: during training, the student model conditions on teacher-generated prefixes, whereas during inference the student autoregresses on self-generated prefixes, leading to compounding errors over long reasoning trajectories. Meanwhile, on-policy or self-distillation methods better match the student's inference-time distribution, but require costly online sampling and often produce low-quality traces in early training. We propose a principled offline reasoning distillation framework that preserves the efficiency and supervision quality of offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
