Class Attention Transfer Based Knowledge Distillation
Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin

TL;DR
This paper introduces CAT-KD, a knowledge distillation method that enhances interpretability by transferring class activation maps, leading to state-of-the-art performance while providing better understanding of CNNs.
Contribution
The paper proposes a novel class attention transfer based knowledge distillation method that improves interpretability and performance of CNN models.
Findings
CAT-KD achieves state-of-the-art results on multiple benchmarks.
Transferring class activation maps enhances CNN interpretability.
The method provides insights into CNN decision-making processes.
Abstract
Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsClass Attention · Knowledge Distillation
