Balance Divergence for Knowledge Distillation
Yafei Qi, Chen Wang, Zhaoning Zhang, Yaping Liu, Yongmin Zhang

TL;DR
This paper introduces Balance Divergence Distillation, a novel method that enhances knowledge transfer in neural networks by addressing the imbalance in negative and positive information during distillation, leading to improved performance.
Contribution
It proposes a new divergence-based approach using reverse Kullback-Leibler divergence to better model small probabilities in teacher networks, improving lightweight student network accuracy.
Findings
Achieves 1-3% accuracy improvement on CIFAR-100 and ImageNet.
Improves mIoU by 4.55% on Cityscapes dataset.
Effective across various computer vision tasks and distillation methods.
Abstract
Knowledge distillation has been widely adopted in computer vision task processing, since it can effectively enhance the performance of lightweight student networks by leveraging the knowledge transferred from cumbersome teacher networks. Most existing knowledge distillation methods utilize Kullback-Leibler divergence to mimic the logit output probabilities between the teacher network and the student network. Nonetheless, these methods may neglect the negative parts of the teacher's ''dark knowledge'' because the divergence calculations may ignore the effect of the minute probabilities from the teacher's logit output. This deficiency may lead to suboptimal performance in logit mimicry during the distillation process and result in an imbalance of information acquired by the student network. In this paper, we investigate the impact of this imbalance and propose a novel method, named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
