Generalizing Teacher Networks for Effective Knowledge Distillation   Across Student Architectures

Kuluhan Binici; Weiming Wu; Tulika Mitra

arXiv:2407.16040·cs.LG·January 9, 2025

Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures

Kuluhan Binici, Weiming Wu, Tulika Mitra

PDF

1 Repo

TL;DR

This paper introduces a generic teacher network trained once to effectively transfer knowledge to various student architectures, reducing the need for repeated customization and improving efficiency in model compression.

Contribution

The paper proposes a one-off KD-aware training method to create a generic teacher capable of effective knowledge transfer across multiple student architectures.

Findings

01

Improves knowledge distillation effectiveness across diverse student models.

02

Reduces training cost by amortizing the generic teacher training.

03

Enhances flexibility in deploying compressed models on different hardware.

Abstract

Knowledge distillation (KD) is a model compression method that entails training a compact student model to emulate the performance of a more complex teacher model. However, the architectural capacity gap between the two models limits the effectiveness of knowledge transfer. Addressing this issue, previous works focused on customizing teacher-student pairs to improve compatibility, a computationally expensive process that needs to be repeated every time either model changes. Hence, these methods are impractical when a teacher model has to be compressed into different student models for deployment on multiple hardware devices with distinct resource constraints. In this work, we propose Generic Teacher Network (GTN), a one-off KD-aware training to create a generic teacher capable of effectively transferring knowledge to any student model sampled from a given finite pool of architectures.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kuluhan/gtn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN