"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework
Jin Cui, Jiaqi Guo, Ruixuan Yang, Jiayi Lu, Jiepeng Zhou, Jiajun Xu, Jiangcheng Song, Boran Zhao, Pengju Ren

TL;DR
COMPACT is a framework that adaptively combines multiple teacher models' reasoning guidance to improve small language models' reasoning abilities without losing original knowledge.
Contribution
It introduces a dynamic, multi-metric approach to fuse diverse teacher supervisions, enhancing reasoning transfer while avoiding negative transfer and catastrophic forgetting.
Findings
Achieves state-of-the-art performance on reasoning benchmarks.
Effectively mitigates catastrophic forgetting in student models.
Successfully integrates diverse reasoning capabilities from multiple teachers.
Abstract
Chain-of-Thought (CoT) reasoning empowers Large Language Models (LLMs) with remarkable capabilities but typically requires prohibitive parameter scales. CoT distillation has emerged as a promising paradigm to transfer reasoning prowess into compact Student Models (SLMs), but existing approaches often rely on a solitary teacher, capping the student's potential since individual LLMs often exhibit distinct capability biases and may suffer from catastrophic forgetting. While leveraging diverse teachers seems appealing, effectively fusing their supervisions remains challenging: teacher-student incompatibility risks amplifying hallucinations, and passive supervision fails to ensure genuine logic internalization. To address this, we introduce COMPACT, a framework that adaptively fuses supervisions from different teachers by dynamically weighting teacher gradients based on the student's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Advanced Graph Neural Networks
