MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

TL;DR
MoEKD introduces a multi-expert knowledge distillation framework that significantly improves robustness and performance of compressed code models by aggregating knowledge from specialized experts, addressing limitations of single-source distillation.
Contribution
The paper proposes MoEKD, a novel KD framework using a Mixture of Experts architecture to enhance robustness and performance in model compression for code understanding tasks.
Findings
Improves adversarial robustness by up to 35.8%.
Enhances predictive performance by up to 13%.
Maintains competitive performance with models reduced by half in size.
Abstract
Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains limited by high computational demands, slow inference speeds, significant energy consumption, and environmental impact. Knowledge distillation (KD) offers a practical solution by transferring knowledge from a large model to a smaller and more efficient model. Despite its effectiveness, recent studies show that models distilled from a single source often exhibit degraded adversarial robustness, even when robustness-aware distillation techniques are employed. These observations suggest a fundamental limitation of single-source distillation in simultaneously transferring high-quality and robust knowledge. To overcome this limitation, we propose Mixture of Experts Knowledge Distillation (MoEKD), a KD framework that leverages a Mixture of Experts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
