Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation
Jiaming Lv, Haoyuan Yang, Peihua Li

TL;DR
This paper introduces Wasserstein Distance-based methods for knowledge distillation, addressing limitations of KL-Div by enabling cross-category comparison and better handling of non-overlapping distributions, leading to improved performance in image tasks.
Contribution
It proposes novel Wasserstein Distance-based distillation techniques for logits and features, outperforming existing KL-Div methods in classification and detection tasks.
Findings
WKD-L outperforms KL-Div variants in logit distillation.
WKD-F surpasses KL-Div and state-of-the-art in feature distillation.
Wasserstein Distance effectively captures inter-category relations and distribution geometry.
Abstract
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein Distance (WD) based knowledge distillation. Specifically, we propose a logit distillation method called WKD-L based on discrete WD, which performs cross-category comparison of probabilities and thus can explicitly leverage rich interrelations among categories. Moreover, we introduce a feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBrain Tumor Detection and Classification · Neural Networks and Applications
MethodsKnowledge Distillation
