An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang, Zezhou Yang, Cuiyun Gao, Xin Xia, Qing Liao

TL;DR
This study systematically evaluates knowledge distillation techniques for code understanding, demonstrating that feature-based methods significantly improve student model performance while reducing parameters, with code-specific PLMs being particularly effective.
Contribution
It provides a comprehensive empirical analysis of KD methods in code understanding, highlighting the effectiveness of feature-based KD and insights into model architecture choices.
Findings
KD boosts performance across various student models.
Feature-based KD achieves up to 98% teacher performance with 5% parameters.
Similarity to teacher architecture does not guarantee better results.
Abstract
Pre-trained language models (PLMs) have emerged as powerful tools for code understanding. However, deploying these PLMs in large-scale applications faces practical challenges due to their computational intensity and inference latency. Knowledge distillation (KD), a promising model compression and acceleration technique, addresses these limitations by transferring knowledge from large teacher models to compact student models, enabling efficient inference while preserving most of the teacher models' capabilities. While this technique has shown remarkable success in natural language processing and computer vision domains, its potential for code understanding tasks remains largely underexplored. In this paper, we systematically investigate the effectiveness and usage of KD in code understanding tasks. Our study encompasses two popular types of KD methods, i.e., logit-based and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
