Efficient and Robust Knowledge Distillation from A Stronger Teacher   Based on Correlation Matching

Wenqi Niu; Yingchao Wang; Guohui Cai; Hanpo Hou

arXiv:2410.06561·cs.LG·October 10, 2024

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Wenqi Niu, Yingchao Wang, Guohui Cai, Hanpo Hou

PDF

Open Access

TL;DR

This paper introduces a novel correlation matching knowledge distillation method that improves the efficiency and robustness of transferring knowledge from a stronger teacher model to a student, addressing limitations of traditional KD methods.

Contribution

It proposes a new KD approach using Pearson and Spearman correlations to better preserve class relationships, enhancing performance and generalization.

Findings

01

Achieves state-of-the-art results on CIFAR-100 and ImageNet datasets.

02

Effectively adapts to various teacher architectures and sizes.

03

Improves student model accuracy and robustness over traditional KD methods.

Abstract

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based on Kullback-Leibler (KL) divergence loss. However, the student performance improvements achieved through KD exhibit diminishing marginal returns, where a stronger teacher model does not necessarily lead to a proportionally stronger student model. To address this issue, we empirically find that the KL-based KD method may implicitly change the inter-class relationships learned by the student model, resulting in a more complex and ambiguous decision boundary, which in turn reduces the model's accuracy and generalization ability. Therefore, this study argues that the student model should learn not only the probability values from the teacher's output but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment

MethodsKnowledge Distillation