TL;DR
This paper introduces QR-Distill, a novel knowledge distillation method that uses quality filtering, conditional routing, and peer teaching to improve reasoning in compact language models by effectively capturing diverse reasoning paths.
Contribution
It presents QR-Distill, a new approach combining path quality filtering, dynamic routing, and cooperative peer teaching for better knowledge transfer from large to small language models.
Findings
QR-Distill outperforms traditional distillation methods.
Each component (quality filtering, routing, peer teaching) significantly improves performance.
Ablation studies confirm the effectiveness of all proposed components.
Abstract
Advances in large language models (LLMs) significantly enhance reasoning capabilities but their deployment is restricted in resource-constrained scenarios. Knowledge distillation addresses this by transferring knowledge from powerful teacher models to compact and transparent students. However, effectively capturing the teacher's comprehensive reasoning is challenging due to conventional token-level supervision's limited scope. Using multiple reasoning paths per query alleviates this problem, but treating each path identically is suboptimal as paths vary widely in quality and suitability across tasks and models. We propose Quality-filtered Routing with Cooperative Distillation (QR-Distill), combining path quality filtering, conditional routing, and cooperative peer teaching. First, quality filtering retains only correct reasoning paths scored by an LLM-based evaluation. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
