Learning Student-Friendly Teacher Networks for Knowledge Distillation
Dae Young Park, Moon-Hyun Cha, Changwook Jeong, Dae Sin Kim, Bohyung, Han

TL;DR
This paper introduces a new knowledge distillation method that trains teacher models to be more student-friendly, enhancing the transfer of dark knowledge and improving student model performance across various architectures.
Contribution
It proposes a novel approach to train teacher models jointly with student branches, making teachers more suitable for knowledge transfer, unlike traditional methods.
Findings
Improves accuracy of student models across different distillation techniques.
Enhances convergence speed of student models.
Effective even with heterogeneous teacher-student architectures.
Abstract
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation methods can adopt this technique to improve the performance of diverse student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
