Channel Distillation: Channel-Wise Attention for Knowledge Distillation
Zaida Zhou, Chaoran Zhuge, Xinwei Guan, Wen Liu

TL;DR
This paper introduces a novel knowledge distillation method called Channel Distillation, which uses channel-wise attention and a decay strategy to improve student network performance, outperforming existing methods on ImageNet and CIFAR100.
Contribution
The paper proposes a new distillation approach combining channel-wise attention, guided knowledge transfer, and loss decay to enhance student network training.
Findings
Achieves 27.68% top-1 error on ImageNet with ResNet18.
Student outperforms teacher on CIFAR100.
Outperforms state-of-the-art distillation methods.
Abstract
Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the teacher. In this paper, we propose a new distillation method, which contains two transfer distillation strategies and a loss decay strategy. The first transfer strategy is based on channel-wise attention, called Channel Distillation (CD). CD transfers the channel information from the teacher to the student. The second is Guided Knowledge Distillation (GKD). Unlike Knowledge Distillation (KD), which allows the student to mimic each sample's prediction distribution of the teacher, GKD only enables the student to mimic the correct output of the teacher. The last part is Early Decay Teacher (EDT). During the training process, we gradually decay the weight of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation
