Multi-level Knowledge Distillation via Knowledge Alignment and   Correlation

Fei Ding; Yin Yang; Hongxin Hu; Venkat Krovi; Feng Luo

arXiv:2012.00573·cs.CV·June 7, 2021·1 cites

Multi-level Knowledge Distillation via Knowledge Alignment and Correlation

Fei Ding, Yin Yang, Hongxin Hu, Venkat Krovi, Feng Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces Multi-level Knowledge Distillation (MLKD), a novel approach that combines knowledge alignment and correlation to enhance model compression and transferability across various settings.

Contribution

MLKD effectively integrates both alignment and correlation knowledge transfer, improving upon existing KD methods for diverse models and tasks.

Findings

01

MLKD outperforms state-of-the-art KD methods in multiple experimental settings.

02

MLKD improves the reliability and transferability of learned representations.

03

MLKD is task-agnostic and model-agnostic, compatible with various pretraining strategies.

Abstract

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that traditional KD methods, which minimize the KL divergence of softmax outputs between networks, are related to the knowledge alignment of an individual sample only. Meanwhile, recent contrastive learning-based KD methods mainly transfer relational knowledge between different samples, namely, knowledge correlation. While it is important to transfer the full knowledge from teacher to student, we introduce the Multi-level Knowledge Distillation (MLKD) by effectively considering both knowledge alignment and correlation. MLKD is task-agnostic and model-agnostic, and can easily transfer knowledge from supervised or self-supervised pretrained teachers. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ifding/MLKD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation · Softmax