Knowledge Distillation from A Stronger Teacher

Tao Huang; Shan You; Fei Wang; Chen Qian; Chang Xu

arXiv:2205.10536·cs.CV·December 29, 2022·95 cites

Knowledge Distillation from A Stronger Teacher

Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu

PDF

Open Access 3 Repos

TL;DR

This paper introduces DIST, a knowledge distillation method that effectively leverages stronger teachers by focusing on relational and intra-class similarities, leading to improved performance across multiple vision tasks.

Contribution

Proposes a novel correlation-based relational loss for distillation from stronger teachers, addressing prediction discrepancy issues in existing methods.

Findings

01

Achieves state-of-the-art results on image classification, object detection, and segmentation.

02

Effective across various architectures and training strategies.

03

Improves student performance by capturing intrinsic inter-class and intra-class relations.

Abstract

Unlike existing knowledge distillation methods focus on the baseline settings, where the teacher models and training strategies are not that strong and competing as state-of-the-art approaches, this paper presents a method dubbed DIST to distill better from a stronger teacher. We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer. As a result, the exact match of predictions in KL divergence would disturb the training and make existing methods perform poorly. In this paper, we show that simply preserving the relations between the predictions of teacher and student would suffice, and propose a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly. Besides, considering that different instances have different semantic similarities to each class, we also extend this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation