Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation
Chengming Hu, Haolun Wu, Xuan Li, Chen Ma, Xi Chen, Jun Yan, Boyu, Wang, Xue Liu

TL;DR
This paper proposes an adaptive, sample-wise knowledge fusion method for knowledge distillation that exploits trilateral geometric relations among teacher, student, and ground truth, improving performance across tasks.
Contribution
Introduces a novel adaptive fusion ratio learning approach using intra- and inter-sample relations, leveraging trilateral geometry for enhanced knowledge distillation.
Findings
Consistent improvements over existing methods in image classification.
Effective in attack detection and click-through rate prediction.
Adaptive ratio outperforms fixed or heuristic-based fusion strategies.
Abstract
Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision from ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction (), teacher prediction (), and ground truth (). To counterbalance the impact of outliers, we further extend to the inter-sample relations, incorporating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
