Confidence-Aware Multi-Teacher Knowledge Distillation
Hailin Zhang, Defang Chen, Can Wang

TL;DR
This paper introduces CA-MKD, a confidence-aware multi-teacher knowledge distillation method that adaptively weights teacher predictions based on their reliability, improving student model performance.
Contribution
The paper proposes a novel confidence-aware approach for multi-teacher knowledge distillation that considers prediction quality and incorporates intermediate layer knowledge.
Findings
CA-MKD outperforms state-of-the-art methods across various architectures.
Adaptive weighting based on confidence improves student learning.
Inclusion of intermediate layers stabilizes knowledge transfer.
Abstract
Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other various label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates intermediate layers to stable the knowledge transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsKnowledge Distillation
