Learning from Diverse Reasoning Paths with Routing and Collaboration

Zhenyu Lei; Zhen Tan; Song Wang; Yaochen Zhu; Zihan Chen; Yushun Dong; Jundong Li

arXiv:2508.16861·cs.CL·August 26, 2025

Learning from Diverse Reasoning Paths with Routing and Collaboration

Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li

PDF

1 Video

TL;DR

This paper introduces QR-Distill, a novel knowledge distillation method that uses quality filtering, conditional routing, and peer teaching to improve reasoning in compact language models by effectively capturing diverse reasoning paths.

Contribution

It presents QR-Distill, a new approach combining path quality filtering, dynamic routing, and cooperative peer teaching for better knowledge transfer from large to small language models.

Findings

01

QR-Distill outperforms traditional distillation methods.

02

Each component (quality filtering, routing, peer teaching) significantly improves performance.

03

Ablation studies confirm the effectiveness of all proposed components.

Abstract

Advances in large language models (LLMs) significantly enhance reasoning capabilities but their deployment is restricted in resource-constrained scenarios. Knowledge distillation addresses this by transferring knowledge from powerful teacher models to compact and transparent students. However, effectively capturing the teacher's comprehensive reasoning is challenging due to conventional token-level supervision's limited scope. Using multiple reasoning paths per query alleviates this problem, but treating each path identically is suboptimal as paths vary widely in quality and suitability across tasks and models. We propose Quality-filtered Routing with Cooperative Distillation (QR-Distill), combining path quality filtering, conditional routing, and cooperative peer teaching. First, quality filtering retains only correct reasoning paths scored by an LLM-based evaluation. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning from Diverse Reasoning Paths with Routing and Collaboration· underline