AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family
Roy Henha Eyono, Fabio Maria Carlucci, Pedro M Esperan\c{c}a, Binxin, Ru, Phillip Torr

TL;DR
AutoKD automatically discovers optimal student neural network architectures and knowledge distillation parameters, significantly improving efficiency and performance across multiple datasets by leveraging Bayesian Optimization and a flexible search space.
Contribution
The paper introduces AutoKD, a Bayesian Optimization-based method that automatically searches for a family of effective student architectures and KD parameters, reducing human effort and computational cost.
Findings
AutoKD achieves teacher-level performance with 3x less memory.
AutoKD is 20x more sample efficient than existing NAS methods.
AutoKD outperforms advanced KD variants with hand-designed students.
Abstract
State-of-the-art results in deep learning have been improving steadily, in good part due to the use of larger models. However, widespread use is constrained by device hardware limitations, resulting in a substantial performance gap between state-of-the-art models and those that can be effectively deployed on small devices. While Knowledge Distillation (KD) theoretically enables small student models to emulate larger teacher models, in practice selecting a good student architecture requires considerable human expertise. Neural Architecture Search (NAS) appears as a natural solution to this problem but most approaches can be inefficient, as most of the computation is spent comparing architectures sampled from the same distribution, with negligible differences in performance. In this paper, we propose to instead search for a family of student architectures sharing the property of being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsKnowledge Distillation
