PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, Bowen, Zhou

TL;DR
PaD introduces reasoning programs to improve distillation of reasoning capabilities from large language models to smaller models, outperforming some LLMs and baselines by reducing errors in synthetic data.
Contribution
The paper proposes Program-aided Distillation (PaD), a novel method that uses reasoning programs to enhance the quality of synthetic data for better small model reasoning.
Findings
Smaller models with PaD outperform certain large models like LLaMA-1 13B.
PaD achieves significant improvements over baseline distillation methods.
Error checking and iterative self-refinement enhance reasoning accuracy.
Abstract
While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment. Previous studies try to distill task-specific ability from LLMs to smaller models, using data synthesis and chain-of-thought (CoT) fine-tuning. However, synthetic CoT data often contains faulty reasoning, which deteriorates the quality of distillation, especially in reasoning capabilities. In this work, we propose Program-aided Distillation (PaD), which introduces reasoning programs to suppress the errors in distilled data, and thus achieves better distillation quality for reasoning tasks. In PaD, we utilize the reasoning program to substitute the CoT, allowing automated error checking of synthetic data. Further, through error injecting and further training, the small distilling model could iteratively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsPruning
