Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models
Cong-Thanh Do, Rama Doddipatla, Kate Knill

TL;DR
This paper investigates how Chain-of-Thought prompting enhances the process of knowledge distillation from large to small language models, leading to improved reasoning abilities in the smaller models.
Contribution
It provides empirical evidence that Chain-of-Thought data significantly improves the effectiveness of white-box knowledge distillation for reasoning tasks.
Findings
CoT improves distilled models' performance on reasoning tasks
White-box KD with CoT outperforms without CoT
Distilled models achieve higher scores on BBH benchmark
Abstract
Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation (KD) to transfer reasoning capability from a larger LLM to a smaller one. This paper examines the role of CoT in distilling the reasoning capability from larger LLMs to smaller LLMs using white-box KD, analysing its effectiveness in improving the performance of the distilled models for various natural language reasoning and understanding tasks. We conduct white-box KD experiments using LLMs from the Qwen and Llama2 families, employing CoT data from the CoT-Collection dataset. The distilled models are then evaluated on natural language reasoning and understanding tasks from the BIG-Bench-Hard (BBH) benchmark, which presents complex challenges for smaller LLMs. Experimental results demonstrate the role…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
