TL;DR
The paper presents CoRD, a collaborative multi-teacher decoding method for efficient, high-quality reasoning data distillation in Long-CoT models, outperforming existing approaches.
Contribution
Introducing CoRD, a step-wise, collaborative decoding framework that leverages heterogeneous teachers and dynamic exploration for better reasoning data distillation.
Findings
CoRD produces higher-quality reasoning data.
Achieves near teacher-level student performance with fewer supervision signals.
Generalizes well to out-of-domain and open-ended tasks.
Abstract
Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
