PaD: Program-aided Distillation Can Teach Small Models Reasoning Better   than Chain-of-thought Fine-tuning

Xuekai Zhu; Biqing Qi; Kaiyan Zhang; Xinwei Long; Zhouhan Lin; Bowen; Zhou

arXiv:2305.13888·cs.CL·March 21, 2024·5 cites

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, Bowen, Zhou

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

PaD introduces reasoning programs to improve distillation of reasoning capabilities from large language models to smaller models, outperforming some LLMs and baselines by reducing errors in synthetic data.

Contribution

The paper proposes Program-aided Distillation (PaD), a novel method that uses reasoning programs to enhance the quality of synthetic data for better small model reasoning.

Findings

01

Smaller models with PaD outperform certain large models like LLaMA-1 13B.

02

PaD achieves significant improvements over baseline distillation methods.

03

Error checking and iterative self-refinement enhance reasoning accuracy.

Abstract

While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment. Previous studies try to distill task-specific ability from LLMs to smaller models, using data synthesis and chain-of-thought (CoT) fine-tuning. However, synthetic CoT data often contains faulty reasoning, which deteriorates the quality of distillation, especially in reasoning capabilities. In this work, we propose Program-aided Distillation (PaD), which introduces reasoning programs to suppress the errors in distilled data, and thus achieves better distillation quality for reasoning tasks. In PaD, we utilize the reasoning program to substitute the CoT, allowing automated error checking of synthetic data. Further, through error injecting and further training, the small distilling model could iteratively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuekai-zhu/pad
pytorchOfficial

Datasets

xuekai/pad_train
dataset· 15 dl
15 dl

Videos

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsPruning