Knowledge Distillation via Token-level Relationship Graph
Shuoxi Zhang, Hanpeng Liu, Kun He

TL;DR
This paper introduces a novel knowledge distillation method called TRG that leverages token-level relationship graphs and contextual loss to improve semantic transfer from teacher to student models, especially in imbalanced data scenarios.
Contribution
The paper proposes a new token-level relationship graph approach and a contextual loss to enhance knowledge distillation, addressing limitations of previous instance-level methods.
Findings
TRG outperforms state-of-the-art methods on visual classification tasks.
The approach is effective on imbalanced datasets.
It establishes new performance benchmarks in knowledge distillation.
Abstract
Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily focus on distilling individual information or instance-level relationships, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages the token-wise relational knowledge to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved distillation results. To further enhance the learning process, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Retinal Imaging and Analysis
MethodsKnowledge Distillation · Focus
