Decoupled Knowledge Distillation
Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang

TL;DR
This paper introduces Decoupled Knowledge Distillation (DKD), a novel approach that separates target and non-target class knowledge transfer in logit distillation, improving flexibility and efficiency over classical methods.
Contribution
It reformulates classical KD into two decoupled parts, revealing their roles and limitations, and proposes DKD to enhance knowledge transfer effectiveness and flexibility.
Findings
DKD achieves comparable or better results than complex feature-based methods.
DKD improves training efficiency on CIFAR-100, ImageNet, and MS-COCO.
Classical KD suppresses NCKD effectiveness due to coupled formulation.
Abstract
State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we reformulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the "difficulty" of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation (DKD), enabling TCKD and NCKD to play their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsKnowledge Distillation
