Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs
Yujia Chen, Yang Ye, Zhongqi Li, Yuchi Ma, Cuiyun Gao

TL;DR
This paper introduces SODA, a self-paced knowledge distillation framework that creates lightweight yet highly effective large code models, significantly improving performance and surpassing some existing models like ChatGPT.
Contribution
The paper proposes a novel self-paced knowledge distillation method for developing lightweight code models, with a new framework and a series of models outperforming larger counterparts.
Findings
SODA improves student models by 65.96% Pass@1.
SodaCoder models outperform 15 larger LCMs.
SodaCoder-DS-6.7B surpasses ChatGPT on average Pass@1.
Abstract
Large code models (LCMs) have remarkably advanced the field of code generation. Despite their impressive capabilities, they still face practical deployment issues, such as high inference costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These issues highlight the critical need for more accessible, lightweight yet effective LCMs. Knowledge distillation (KD) offers a promising solution, which transfers the programming capabilities of larger, advanced LCMs to smaller, less powerful LCMs. In this paper, we propose a novel Self-Paced knOwledge DistillAtion framework, named SODA, aiming at developing lightweight yet effective student LCMs. SODA consists of three stages in one cycle: (1) Correct-and-Fault Knowledge Delivery stage aims at improving the student models capability to recognize errors while ensuring its basic programming skill during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods
