Understanding the Effects of Projectors in Knowledge Distillation

Yudong Chen; Sen Wang; Jiajun Liu; Xuwei Xu; Frank de Hoog; Brano; Kusy; Zi Huang

arXiv:2310.17183·cs.CV·October 27, 2023·2 cites

Understanding the Effects of Projectors in Knowledge Distillation

Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Brano, Kusy, Zi Huang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the often-overlooked role of projectors in knowledge distillation, revealing their benefits even when feature dimensions match and proposing an ensemble method to enhance distillation performance.

Contribution

It uncovers the positive effects of projectors in knowledge distillation and introduces a projector ensemble approach for improved student model performance.

Findings

01

Students with projectors achieve better accuracy trade-offs.

02

Projectors help preserve teacher-student similarity beyond numeric metrics.

03

The proposed ensemble method outperforms baseline distillation techniques.

Abstract

Conventionally, during the knowledge distillation process (e.g. feature distillation), an additional projector is often required to perform feature transformation due to the dimension mismatch between the teacher and the student networks. Interestingly, we discovered that even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance. In addition, projectors even improve logit distillation if we add them to the architecture too. Inspired by these surprising findings and the general lack of understanding of the projectors in the knowledge distillation process from existing literature, this paper investigates the implicit role that projectors play but so far have been overlooked. Our empirical study shows that the student with a projector (1) obtains a better trade-off between the training accuracy and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenyd7/pefd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Neural Networks and Applications

MethodsKnowledge Distillation