SCOTT: Self-Consistent Chain-of-Thought Distillation
Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, Xiang, Ren

TL;DR
This paper introduces SCOTT, a method for distilling large language models into smaller, self-consistent models that generate more faithful rationales for their predictions, improving interpretability without sacrificing performance.
Contribution
The paper presents a novel knowledge distillation approach that ensures the smaller model produces consistent and faithful chain-of-thought rationales aligned with the teacher model.
Findings
Smaller models can generate more faithful rationales using SCOTT.
SCOTT achieves comparable task performance to larger models.
The method enhances the interpretability and trustworthiness of LLMs.
Abstract
Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance, such gains are only observed for sufficiently large LMs. Even more concerning, there is little guarantee that the generated rationales are consistent with LM's predictions or faithfully justify the decisions. In this work, we propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger. To form better supervision, we elicit rationales supporting the gold answers from a large LM (teacher) by contrastive decoding, which encourages the teacher to generate tokens that become more plausible only when the answer is considered. To ensure faithful distillation, we use the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
MethodsKnowledge Distillation
