Improving Knowledge Distillation with Teacher's Explanation
Sayantan Chowdhury, Ben Liang, Ali Tizghadam, and Ilijc Albanese

TL;DR
This paper introduces a novel knowledge distillation framework that incorporates teacher explanations, enabling students to learn from both predictions and explanations, leading to improved performance over traditional methods.
Contribution
The paper proposes a new Knowledge Explaining Distillation (KED) framework with superfeature explanations, extending KD to include explanations and reduce model complexity.
Findings
KED students outperform KD students of similar complexity.
The method effectively reduces CNN complexity.
KED works well with limited training data.
Abstract
Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This limits the amount of transferred knowledge. In this work, we introduce a novel Knowledge Explaining Distillation (KED) framework, which allows the student to learn not only from the teacher's predictions but also from the teacher's explanations. We propose a class of superfeature-explaining teachers that provide explanation over groups of features, along with the corresponding student model. We also present a method for constructing the superfeatures. We then extend KED to reduce complexity in convolutional neural networks, to allow augmentation with hidden-representation distillation methods, and to work with a limited amount of training data using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Online Learning and Analytics · Machine Learning and Data Classification
