Understanding and Improving Knowledge Distillation

Jiaxi Tang; Rakesh Shivanna; Zhe Zhao; Dong Lin; Anima Singh; Ed H.; Chi; Sagar Jain

arXiv:2002.03532·cs.LG·March 2, 2021·89 cites

Understanding and Improving Knowledge Distillation

Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H., Chi, Sagar Jain

PDF

Open Access

TL;DR

This paper provides a comprehensive analysis of knowledge distillation, categorizing the types of teacher knowledge and examining their effects on student training, leading to better understanding and diagnosis of KD's successes and failures.

Contribution

It introduces a hierarchical categorization of teacher knowledge in KD and systematically analyzes their impacts on student training dynamics.

Findings

01

Teacher's knowledge of the universe acts as regularization.

02

Domain knowledge influences class relationship encoding.

03

Instance-specific knowledge rescales gradients based on difficulty.

Abstract

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model's training dynamics remains under-explored. In this paper, we categorize teacher's knowledge into three hierarchical levels and study its effects on knowledge distillation: (1) knowledge of the `universe', where KD brings a regularization effect through label smoothing; (2) domain knowledge, where teacher injects class relationships prior to student's logit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)