Similarity Transfer for Knowledge Distillation

Haoran Zhao; Kun Gong; Xin Sun; Junyu Dong; Hui Yu

arXiv:2103.10047·cs.CV·March 19, 2021·1 cites

Similarity Transfer for Knowledge Distillation

Haoran Zhao, Kun Gong, Xin Sun, Junyu Dong, Hui Yu

PDF

Open Access

TL;DR

This paper introduces a novel similarity transfer method for knowledge distillation that leverages instance similarity correlations and mixup techniques to improve the performance of compact neural networks.

Contribution

It proposes a new distillation approach that fully utilizes category similarities and instance correlations through mixup, outperforming existing methods.

Findings

01

Significant accuracy improvements on CIFAR-10, CIFAR-100, CINIC-10, and Tiny-ImageNet.

02

Outperforms vanilla knowledge distillation and state-of-the-art methods.

03

Effective use of virtual samples created by mixup enhances student model performance.

Abstract

Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances that plays an important role in confidence prediction. To tackle this issue, we propose a novel method in this paper, called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. Furthermore, we propose to better capture the similarity correlation between different instances by the mixup technique, which creates virtual samples by a weighted linear interpolation. Note that, our distillation loss can fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation · Mixup