Activation Map Adaptation for Effective Knowledge Distillation
Zhiyuan Wu, Hong Qi, Yu Jiang, Minghao Zhao, Chupeng Cui, Zongmin Yang, and Xinhui Xue

TL;DR
This paper introduces an activation map adaptation method for knowledge distillation that improves student network accuracy and training speed by adaptively selecting supervisory features during training.
Contribution
It proposes a novel activation map adaptive module to enhance knowledge transfer in neural network compression, improving accuracy and efficiency.
Findings
Boosts student network accuracy by 0.6%
Reduces training loss by 6.5%
Speeds up training process
Abstract
Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsKnowledge Distillation
