Improve Student's Reasoning Generalizability through Cascading   Decomposed CoTs Distillation

Chengwei Dai; Kun Li; Wei Zhou; Songlin Hu

arXiv:2405.19842·cs.CL·May 31, 2024

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CasCoD, a two-step distillation method that improves the reasoning generalizability of smaller language models by focusing on rationales rather than preset answers, enhancing out-of-domain performance.

Contribution

The paper proposes a novel cascading decomposed CoTs distillation method that restructures training to improve reasoning generalizability of student models.

Findings

01

CasCoD outperforms baseline methods on both in-domain and out-of-domain reasoning tasks.

02

Removing answers from training improves model focus on rationales.

03

Two-step training enhances reasoning diversity and generalization.

Abstract

Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated Chain-of-Thoughts (CoTs) data. Although these methods enhance in-domain (IND) reasoning performance, they struggle to generalize to out-of-domain (OOD) tasks. We believe that the widespread spurious correlations between questions and answers may lead the model to preset a specific answer which restricts the diversity and generalizability of its reasoning process. In this paper, we propose Cascading Decomposed CoTs Distillation (CasCoD) to address these issues by decomposing the traditional single-step learning process into two cascaded learning steps. Specifically, by restructuring the training objectives -- removing the answer from outputs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

c-w-d/cascod
pytorchOfficial

Videos

Improve Student’s Reasoning Generalizability through Cascading Decomposed CoTs Distillation· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsFocus