Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Yuming Yang, Mingyoung Lai, Wanxu Zhao, Xiaoran Fan, Zhiheng Xi, Mingqi Wu, Chiyue Huang, Jun Zhao, Haijun Lv, Jian Tong, Yunhua Zhou, Yicheng Zou, Qipeng Guo, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
This paper introduces the Rank-Surprisal Ratio (RSR), a new metric that effectively assesses the informativeness and alignment of reasoning trajectories to improve student LLM training.
Contribution
The paper proposes RSR, a simple yet effective metric that outperforms existing measures in selecting informative reasoning trajectories for training student models.
Findings
RSR correlates strongly with reasoning performance (average Spearman 0.86).
RSR outperforms existing metrics in trajectory and teacher selection.
RSR is easy to compute and interpret across diverse models and trajectories.
Abstract
Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model's current behavior but overlooking more informative ones. Addressing this, we propose Rank-Surprisal Ratio (RSR), a simple metric that captures both alignment and informativeness to assess the suitability of a reasoning trajectory. RSR is motivated by the observation that effective trajectories typically balance learning signal strength and behavioral alignment by combining low absolute probability with relatively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
