Adaptive Multi-Teacher Multi-level Knowledge Distillation

Yuang Liu; Wei Zhang; Jun Wang

arXiv:2103.04062·cs.CV·March 9, 2021

Adaptive Multi-Teacher Multi-level Knowledge Distillation

Yuang Liu, Wei Zhang, Jun Wang

PDF

1 Repo

TL;DR

This paper introduces AMTML-KD, a novel framework for knowledge distillation that adaptively learns from multiple teachers at different levels, improving student network performance.

Contribution

It proposes a new adaptive multi-teacher multi-level distillation method that assigns importance weights to teachers and gathers intermediate hints from multiple sources.

Findings

01

Student models outperform strong competitors on public datasets.

02

Adaptive weighting improves the relevance of teacher knowledge.

03

Multi-level distillation enhances learning effectiveness.

Abstract

Knowledge distillation~(KD) is an effective learning paradigm for improving the performance of lightweight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework~(AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets~(high-level knowledge) and (ii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FLHonker/AMTML-KD-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation