Student-friendly Knowledge Distillation

Mengyang Yuan; Bo Lang; Fengnan Quan

arXiv:2305.10893·cs.CV·May 19, 2023·1 cites

Student-friendly Knowledge Distillation

Mengyang Yuan, Bo Lang, Fengnan Quan

PDF

Open Access

TL;DR

This paper introduces student-friendly knowledge distillation (SKD), which simplifies teacher outputs to improve student learning efficiency and effectiveness, achieving state-of-the-art results on CIFAR-100 and ImageNet.

Contribution

The paper proposes a novel SKD method that simplifies teacher knowledge using softening and attention-based learning simplification, enhancing distillation effectiveness.

Findings

01

Achieves state-of-the-art performance on CIFAR-100 and ImageNet.

02

Maintains high training efficiency.

03

Easily combined with other distillation methods.

Abstract

In knowledge distillation, the knowledge from the teacher model is often too complex for the student model to thoroughly process. However, good teachers in real life always simplify complex material before teaching it to students. Inspired by this fact, we propose student-friendly knowledge distillation (SKD) to simplify teacher output into new knowledge representations, which makes the learning of the student model easier and more effective. SKD contains a softening processing and a learning simplifier. First, the softening processing uses the temperature hyperparameter to soften the output logits of the teacher model, which simplifies the output to some extent and makes it easier for the learning simplifier to process. The learning simplifier utilizes the attention mechanism to further simplify the knowledge of the teacher model and is jointly trained with the student model using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation