AUTOKD: Automatic Knowledge Distillation Into A Student Architecture   Family

Roy Henha Eyono; Fabio Maria Carlucci; Pedro M Esperan\c{c}a; Binxin; Ru; Phillip Torr

arXiv:2111.03555·cs.LG·November 8, 2021

AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family

Roy Henha Eyono, Fabio Maria Carlucci, Pedro M Esperan\c{c}a, Binxin, Ru, Phillip Torr

PDF

Open Access

TL;DR

AutoKD automatically discovers optimal student neural network architectures and knowledge distillation parameters, significantly improving efficiency and performance across multiple datasets by leveraging Bayesian Optimization and a flexible search space.

Contribution

The paper introduces AutoKD, a Bayesian Optimization-based method that automatically searches for a family of effective student architectures and KD parameters, reducing human effort and computational cost.

Findings

01

AutoKD achieves teacher-level performance with 3x less memory.

02

AutoKD is 20x more sample efficient than existing NAS methods.

03

AutoKD outperforms advanced KD variants with hand-designed students.

Abstract

State-of-the-art results in deep learning have been improving steadily, in good part due to the use of larger models. However, widespread use is constrained by device hardware limitations, resulting in a substantial performance gap between state-of-the-art models and those that can be effectively deployed on small devices. While Knowledge Distillation (KD) theoretically enables small student models to emulate larger teacher models, in practice selecting a good student architecture requires considerable human expertise. Neural Architecture Search (NAS) appears as a natural solution to this problem but most approaches can be inefficient, as most of the computation is spent comparing architectures sampled from the same distribution, with negligible differences in performance. In this paper, we propose to instead search for a family of student architectures sharing the property of being…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsKnowledge Distillation