MCC-KD: Multi-CoT Consistent Knowledge Distillation

Hongzhan Chen; Siyue Wu; Xiaojun Quan; Rui Wang; Ming Yan; Ji Zhang

arXiv:2310.14747·cs.CL·December 21, 2023·2 cites

MCC-KD: Multi-CoT Consistent Knowledge Distillation

Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MCC-KD, a method for transferring complex reasoning skills from large language models to smaller ones by generating diverse rationales and ensuring consistency among their predictions.

Contribution

It proposes a novel knowledge distillation technique that enforces consistency among multiple rationales, improving reasoning capabilities in smaller models.

Findings

01

MCC-KD outperforms baseline methods on reasoning benchmarks.

02

It demonstrates robustness on out-of-distribution datasets.

03

Effective across various model architectures and scales.

Abstract

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

homzer/MCC-KD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsKnowledge Distillation · Focus