Improved Knowledge Distillation via Full Kernel Matrix Transfer

Qi Qian; Hao Li; Juhua Hu

arXiv:2009.14416·cs.LG·March 31, 2022

Improved Knowledge Distillation via Full Kernel Matrix Transfer

Qi Qian, Hao Li, Juhua Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient method for knowledge distillation that transfers the full similarity matrix between teacher and student models using Nyström decomposition, improving performance and efficiency.

Contribution

It proposes a novel approach to transfer the full kernel similarity matrix via Nyström approximation, enhancing distillation effectiveness and computational efficiency.

Findings

01

Effective transfer of full similarity matrix improves student model performance.

02

Nyström method reduces computational complexity from quadratic to linear.

03

Empirical results demonstrate superior performance on benchmark datasets.

Abstract

Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from the teacher. Various information for distillation has been studied. Recently, a number of works propose to transfer the pairwise similarity between examples to distill relative information. However, most of efforts are devoted to developing different similarity measurements, while only a small matrix consisting of examples within a mini-batch is transferred at each iteration that can be inefficient for optimizing the pairwise similarity over the whole data set. In this work, we aim to transfer the full similarity matrix effectively. The main challenge is from the size of the full matrix that is quadratic to the number of examples. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idstcv/kda
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM

MethodsKnowledge Distillation