Distilling Knowledge via Intermediate Classifiers

Aryan Asadian; Amirali Salehi-Abari

arXiv:2103.00497·cs.LG·June 1, 2021·1 cites

Distilling Knowledge via Intermediate Classifiers

Aryan Asadian, Amirali Salehi-Abari

PDF

Open Access 2 Repos

TL;DR

This paper proposes a novel knowledge distillation method using intermediate classifiers at various depths of the teacher model to better transfer knowledge to resource-limited students, especially when there is a large capacity gap.

Contribution

Introducing intermediate classifier heads at different depths of the teacher to create a cohort of heterogeneous teachers for improved knowledge transfer.

Findings

01

Outperforms canonical knowledge distillation methods.

02

Effective across various teacher-student pairs and datasets.

03

Mitigates capacity gap issues in knowledge transfer.

Abstract

The crux of knowledge distillation is to effectively train a resource-limited student model with the guide of a pre-trained larger teacher model. However, when there is a large difference between the model complexities of teacher and student (i.e., capacity gap), knowledge distillation loses its strength in transferring knowledge from the teacher to the student, thus training a weaker student. To mitigate the impact of the capacity gap, we introduce knowledge distillation via intermediate heads. By extending the intermediate layers of the teacher (at various depths) with classifier heads, we cheaply acquire a cohort of heterogeneous pre-trained teachers. The intermediate classifier heads can all together be efficiently learned while freezing the backbone of the pre-trained teacher. The cohort of teachers (including the original teacher) co-teach the student simultaneously. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification

MethodsKnowledge Distillation