Scale Decoupled Distillation
Shicai Wei Chunbo Luo Yang Luo

TL;DR
This paper introduces Scale Decoupled Distillation (SDD), a novel logit knowledge distillation method that decouples global logits into local parts to transfer more precise semantic knowledge and improve student model performance, especially in fine-grained tasks.
Contribution
The paper proposes SDD, which decouples logits into local components and separates consistent and complementary knowledge, enhancing the effectiveness of logit-based knowledge distillation.
Findings
SDD outperforms existing methods on benchmark datasets.
Decoupling logits improves transfer of fine-grained semantic knowledge.
Focusing on complementary knowledge enhances discrimination of ambiguous samples.
Abstract
Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowledge to the student and mislead its learning. To this end, we propose a simple but effective method, i.e., Scale Decoupled Distillation (SDD), for logit knowledge distillation. SDD decouples the global logit output into multiple local logit outputs and establishes distillation pipelines for them. This helps the student to mine and inherit fine-grained and unambiguous logit knowledge. Moreover, the decoupled knowledge can be further divided into consistent and complementary logit knowledge that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration
MethodsFocus · Knowledge Distillation
