Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation
Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

TL;DR
This paper introduces EDIT, a novel distillation method that improves small language models' ability to learn key reasoning steps from large models by analyzing dual reasoning paths with divergent conclusions.
Contribution
We propose a mistake-driven distillation approach that identifies and emphasizes crucial reasoning steps, enhancing the learning of key reasoning processes in smaller models.
Findings
EDIT improves key reasoning step learning in SLMs
Dual CoTs reveal crucial reasoning steps with divergent conclusions
EDIT outperforms traditional fine-tuning on reasoning benchmarks
Abstract
As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion () of key reasoning steps that truly impact conclusions. However, previous distillation methods typically involve supervised fine-tuning student SLMs only on correct CoTs data produced by teacher LLMs, resulting in students struggling to learn the key reasoning steps, instead imitating the teacher's reasoning forms and making errors or omissions on these steps. To address these issues, drawing an analogy to human learning, where analyzing mistakes according to correct solutions often reveals the crucial steps leading to successes or failures, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · AI-based Problem Solving and Planning
