Improved Knowledge Distillation via Full Kernel Matrix Transfer
Qi Qian, Hao Li, Juhua Hu

TL;DR
This paper introduces an efficient method for knowledge distillation that transfers the full similarity matrix between teacher and student models using Nyström decomposition, improving performance and efficiency.
Contribution
It proposes a novel approach to transfer the full kernel similarity matrix via Nyström approximation, enhancing distillation effectiveness and computational efficiency.
Findings
Effective transfer of full similarity matrix improves student model performance.
Nyström method reduces computational complexity from quadratic to linear.
Empirical results demonstrate superior performance on benchmark datasets.
Abstract
Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from the teacher. Various information for distillation has been studied. Recently, a number of works propose to transfer the pairwise similarity between examples to distill relative information. However, most of efforts are devoted to developing different similarity measurements, while only a small matrix consisting of examples within a mini-batch is transferred at each iteration that can be inefficient for optimizing the pairwise similarity over the whole data set. In this work, we aim to transfer the full similarity matrix effectively. The main challenge is from the size of the full matrix that is quadratic to the number of examples. To address the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsKnowledge Distillation
