Prime-Aware Adaptive Distillation
Youcai Zhang, Zhonghao Lan, Yuchen Dai, Fangao Zeng, Yan Bai, Jie, Chang, and Yichen Wei

TL;DR
Prime-Aware Adaptive Distillation (PAD) introduces an uncertainty-based adaptive weighting scheme that emphasizes prime samples during knowledge distillation, improving performance across various tasks and outperforming state-of-the-art methods.
Contribution
The paper proposes PAD, a novel distillation method that adaptively emphasizes prime samples using uncertainty learning, refining existing techniques with an innovative approach to sample importance.
Findings
PAD improves performance across classification, metric learning, and object detection.
PAD outperforms recent state-of-the-art distillation methods.
PAD is versatile and effective with various teacher-student combinations.
Abstract
Knowledge distillation(KD) aims to improve the performance of a student network by mimicing the knowledge from a powerful teacher network. Existing methods focus on studying what knowledge should be transferred and treat all samples equally during training. This paper introduces the adaptive sample weighting to KD. We discover that previous effective hard mining methods are not appropriate for distillation. Furthermore, we propose Prime-Aware Adaptive Distillation (PAD) by the incorporation of uncertainty learning. PAD perceives the prime samples in distillation and then emphasizes their effect adaptively. PAD is fundamentally different from and would refine existing methods with the innovative view of unequal training. For this reason, PAD is versatile and has been applied in various tasks including classification, metric learning, and object detection. With ten teacher-student…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Machine Learning and ELM
