Complementary Relation Contrastive Distillation
Jinguo Zhu, Shixiang Tang, Dapeng Chen, Shijie Yu, Yakun, Liu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

TL;DR
This paper introduces Complementary Relation Contrastive Distillation (CRCD), a novel method for knowledge distillation that effectively transfers inter-sample relations from teacher to student models using a contrastive loss.
Contribution
CRCD leverages anchor-based relation estimation and models mutual relations with features and gradients, enhancing the distillation of structural knowledge.
Findings
CRCD outperforms existing methods on various benchmarks.
It effectively distills inter-sample relations and sample representations.
The approach improves student model performance significantly.
Abstract
Knowledge distillation aims to transfer representation ability from a teacher model to a student model. Previous approaches focus on either individual representation distillation or inter-sample similarity preservation. While we argue that the inter-sample relation conveys abundant information and needs to be distilled in a more effective way. In this paper, we propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD), to transfer the structural knowledge from the teacher to the student. Specifically, we estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation. To make it more robust, mutual relations are modeled by two complementary elements: the feature and its gradient. Furthermore, the low bound of mutual information between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Topic Modeling
MethodsKnowledge Distillation
