Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment
Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu,, Shurun Tan, Er-Ping Li

TL;DR
This paper introduces a block-wise logit distillation framework that bridges the gap between feature-based and logit-based knowledge distillation, achieving competitive results and offering new insights into their fundamental differences.
Contribution
It proposes a novel block-wise logit distillation method that implicitly aligns features, unifying and enhancing existing KD approaches with superior performance.
Findings
Achieves comparable or better results than state-of-the-art KD methods.
Demonstrates the potential of combining logit and feature-based distillation.
Provides a unified perspective on feature and logit alignment.
Abstract
Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a well-performed lightweight model. Notably, many subsequent feature-based KD methods outperformed the earliest logit-based KD method and iteratively generated numerous state-of-the-art distillation methods. Nevertheless, recent work has uncovered the potential of the logit-based method, bringing the simple KD form based on logits back into the limelight. Features or logits? They partially implement the KD with entirely distinct perspectives; therefore, choosing between logits and features is not straightforward. This paper provides a unified perspective of feature alignment in order to obtain a better comprehension of their fundamental distinction.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition
