Hint-dynamic Knowledge Distillation
Yiyang Liu, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang

TL;DR
This paper introduces Hint-dynamic Knowledge Distillation (HKD), a novel method that adaptively utilizes teacher hints for each instance during training, improving the effectiveness of knowledge transfer in neural networks.
Contribution
HKD employs a meta-weight network and weight ensembling to dynamically and adaptively leverage teacher hints, enhancing knowledge distillation performance.
Findings
HKD outperforms existing methods on CIFAR-100 and Tiny-ImageNet.
Adaptive hint utilization improves student model accuracy.
Meta-weight and ensembling strategies effectively reduce bias.
Abstract
Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher' s hints in a dynamic scheme. The guidance effect from the knowledge hints usually varies in different instances and learning stages, which motivates us to customize a specific hint-learning manner for each instance adaptively. Specifically, a meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints in the perception of the dynamical learning progress of the student model. We further present a weight ensembling strategy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
