Toward Student-Oriented Teacher Network Training For Knowledge   Distillation

Chengyu Dong; Liyuan Liu; Jingbo Shang

arXiv:2206.06661·cs.LG·May 10, 2024·1 cites

Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Chengyu Dong, Liyuan Liu, Jingbo Shang

PDF

Open Access

TL;DR

This paper proposes a novel teacher training method called SoTeacher, designed to optimize teacher performance specifically for knowledge distillation, leading to improved student accuracy across various datasets and algorithms.

Contribution

It introduces a theoretical framework linking teacher training to true label distribution approximation and proposes a practical training method incorporating Lipschitz and consistency regularization.

Findings

01

SoTeacher improves student accuracy consistently.

02

Theoretical analysis connects teacher training to label distribution approximation.

03

Empirical results validate the effectiveness of SoTeacher across datasets.

Abstract

How to conduct teacher training for knowledge distillation is still an open problem. It has been widely observed that a best-performing teacher does not necessarily yield the best-performing student, suggesting a fundamental discrepancy between the current teacher training practice and the ideal teacher training strategy. To fill this gap, we explore the feasibility of training a teacher that is oriented toward student performance with empirical risk minimization (ERM). Our analyses are inspired by the recent findings that the effectiveness of knowledge distillation hinges on the teacher's capability to approximate the true label distribution of training inputs. We theoretically establish that the ERM minimizer can approximate the true label distribution of training data as long as the feature extractor of the learner network is Lipschitz continuous and is robust to feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation