Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection
Chaoqun He, Yingfa Chen, Chaojun Xiao, Xu Han, Lijie Wen

TL;DR
This paper introduces Gen-SSD, a novel student-in-the-loop framework that performs generation-time selection of reasoning trajectories, improving the distillation of complex reasoning models into smaller models.
Contribution
It proposes a generation-time self-selection distillation method that guides reasoning trajectory expansion during sampling, outperforming existing post-hoc filtering approaches.
Findings
Gen-SSD outperforms standard knowledge distillation by around 5.9 points.
It achieves up to 4.7 points improvement over recent baselines.
Gen-SSD produces more stable and learnable reasoning trajectories.
Abstract
Large reasoning models achieve strong performance on complex tasks through long chain-of-thought (CoT) trajectories, but directly transferring such reasoning processes to smaller models remains challenging. A key difficulty is that not all teacher-generated reasoning trajectories are suitable for student learning. Existing approaches typically rely on post-hoc filtering, selecting trajectories after full generation based on heuristic criteria. However, such methods cannot control the generation process itself and may still produce reasoning paths that lie outside the student's learning capacity. To address this limitation, we propose Gen-SSD (Generation-time Self-Selection Distillation), a student-in-the-loop framework that performs generation-time selection. Instead of passively consuming complete trajectories, the student evaluates candidate continuations during the teacher's sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
