Student-friendly Knowledge Distillation
Mengyang Yuan, Bo Lang, Fengnan Quan

TL;DR
This paper introduces student-friendly knowledge distillation (SKD), which simplifies teacher outputs to improve student learning efficiency and effectiveness, achieving state-of-the-art results on CIFAR-100 and ImageNet.
Contribution
The paper proposes a novel SKD method that simplifies teacher knowledge using softening and attention-based learning simplification, enhancing distillation effectiveness.
Findings
Achieves state-of-the-art performance on CIFAR-100 and ImageNet.
Maintains high training efficiency.
Easily combined with other distillation methods.
Abstract
In knowledge distillation, the knowledge from the teacher model is often too complex for the student model to thoroughly process. However, good teachers in real life always simplify complex material before teaching it to students. Inspired by this fact, we propose student-friendly knowledge distillation (SKD) to simplify teacher output into new knowledge representations, which makes the learning of the student model easier and more effective. SKD contains a softening processing and a learning simplifier. First, the softening processing uses the temperature hyperparameter to soften the output logits of the teacher model, which simplifies the output to some extent and makes it easier for the learning simplifier to process. The learning simplifier utilizes the attention mechanism to further simplify the knowledge of the teacher model and is jointly trained with the student model using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
