Progressive Class-level Distillation
Jiayan Li, Jun Li, Zhourui Zhang, Jianhua Xu

TL;DR
Progressive Class-level Distillation (PCD) enhances knowledge transfer in logit distillation by stage-wise, prioritized, and bidirectional learning, improving performance on classification and detection tasks.
Contribution
The paper introduces PCD, a novel stage-wise, prioritized, and bidirectional distillation method that improves knowledge transfer by focusing on class groups progressively.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effective in both classification and detection tasks.
Improves logits alignment through stage-wise, prioritized learning.
Abstract
In knowledge distillation (KD), logit distillation (LD) aims to transfer class-level knowledge from a more powerful teacher network to a small student model via accurate teacher-student alignment at the logits level. Since high-confidence object classes usually dominate the distillation process, low-probability classes which also contain discriminating information are downplayed in conventional methods, leading to insufficient knowledge transfer. To address this issue, we propose a simple yet effective LD method termed Progressive Class-level Distillation (PCD). In contrast to existing methods which perform all-class ensemble distillation, our PCD approach performs stage-wise distillation for step-by-step knowledge transfer. More specifically, we perform ranking on teacher-student logits difference for identifying distillation priority from scratch, and subsequently divide the entire LD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
