Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models

Cong-Thanh Do; Rama Doddipatla; Kate Knill

arXiv:2511.05184·cs.CL·November 10, 2025

Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models

Cong-Thanh Do, Rama Doddipatla, Kate Knill

PDF

Open Access

TL;DR

This paper investigates how Chain-of-Thought prompting enhances the process of knowledge distillation from large to small language models, leading to improved reasoning abilities in the smaller models.

Contribution

It provides empirical evidence that Chain-of-Thought data significantly improves the effectiveness of white-box knowledge distillation for reasoning tasks.

Findings

01

CoT improves distilled models' performance on reasoning tasks

02

White-box KD with CoT outperforms without CoT

03

Distilled models achieve higher scores on BBH benchmark

Abstract

Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation (KD) to transfer reasoning capability from a larger LLM to a smaller one. This paper examines the role of CoT in distilling the reasoning capability from larger LLMs to smaller LLMs using white-box KD, analysing its effectiveness in improving the performance of the distilled models for various natural language reasoning and understanding tasks. We conduct white-box KD experiments using LLMs from the Qwen and Llama2 families, employing CoT data from the CoT-Collection dataset. The distilled models are then evaluated on natural language reasoning and understanding tasks from the BIG-Bench-Hard (BBH) benchmark, which presents complex challenges for smaller LLMs. Experimental results demonstrate the role…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications