Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon, Kim, Sanjiv Kumar

TL;DR
This paper introduces a theoretical framework based on supervision complexity to understand why knowledge distillation improves model generalization, highlighting the roles of teacher accuracy, student margin, and prediction complexity.
Contribution
It proposes a novel theoretical approach linking supervision complexity with distillation effectiveness and validates it through empirical experiments on image classification tasks.
Findings
Supervision complexity correlates with distillation success.
Early stopping and temperature scaling are theoretically justified.
Online distillation with increasing supervision complexity improves performance.
Abstract
Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student's neural tangent kernel. The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions. Specifically, it provides a rigorous justification for the utility of various techniques that are prevalent in the context of distillation, such as early stopping and temperature scaling. Our analysis further suggests the use of online distillation, where a student receives increasingly more complex supervision from teachers in different stages of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsEarly Stopping
