Knowledge Distillation Layer that Lets the Student Decide
Ada Gorgun, Yeti Z. Gurbuz, A. Aydin Alatan

TL;DR
This paper introduces a learnable knowledge distillation layer that allows the student model to explicitly decide how to leverage teacher knowledge, improving feature transfer and performance during inference.
Contribution
It proposes a novel KD layer that enables the student to learn how to utilize teacher knowledge explicitly, enhancing intermediate feature transfer and inference capabilities.
Findings
Improved accuracy on 3 classification benchmarks
Effective feature transfer in intermediate layers
Learned templates improve knowledge utilization
Abstract
Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Explainable Artificial Intelligence (XAI) · Online Learning and Analytics
MethodsConvolution · Knowledge Distillation
