What Knowledge Gets Distilled in Knowledge Distillation?

Utkarsh Ojha; Yuheng Li; Anirudh Sundara Rajan; Yingyu Liang; Yong Jae; Lee

arXiv:2205.16004·cs.CV·November 7, 2023·6 cites

What Knowledge Gets Distilled in Knowledge Distillation?

Utkarsh Ojha, Yuheng Li, Anirudh Sundara Rajan, Yingyu Liang, Yong Jae, Lee

PDF

Open Access

TL;DR

This paper investigates the nature of the knowledge transferred during knowledge distillation, revealing that it encompasses various properties beyond task accuracy, with implications for understanding and improving the process.

Contribution

It provides a comprehensive analysis of what knowledge is distilled, exploring properties like localization, adversarial robustness, and invariance, which were previously not well understood.

Findings

01

Distillation transfers properties like object localization and invariance.

02

Existing methods indirectly distill multiple properties beyond task performance.

03

Insights have practical implications for designing better distillation techniques.

Abstract

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel techniques and use cases of knowledge distillation. Yet, despite the various improvements, there seems to be a glaring gap in the community's fundamental understanding of the process. Specifically, what is the knowledge that gets distilled in knowledge distillation? In other words, in what ways does the student become similar to the teacher? Does it start to localize objects in the same way? Does it get fooled by the same adversarial samples? Does its data invariance properties become similar? Our work presents a comprehensive study to try to answer these questions. We show that existing methods can indeed indirectly distill these properties beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning