Decoupling Dark Knowledge via Block-wise Logit Distillation for   Feature-level Alignment

Chengting Yu; Fengzhao Zhang; Ruizhe Chen; Aili Wang; Zuozhu Liu,; Shurun Tan; Er-Ping Li

arXiv:2411.01547·cs.LG·December 4, 2024

Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment

Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu,, Shurun Tan, Er-Ping Li

PDF

Open Access

TL;DR

This paper introduces a block-wise logit distillation framework that bridges the gap between feature-based and logit-based knowledge distillation, achieving competitive results and offering new insights into their fundamental differences.

Contribution

It proposes a novel block-wise logit distillation method that implicitly aligns features, unifying and enhancing existing KD approaches with superior performance.

Findings

01

Achieves comparable or better results than state-of-the-art KD methods.

02

Demonstrates the potential of combining logit and feature-based distillation.

03

Provides a unified perspective on feature and logit alignment.

Abstract

Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a well-performed lightweight model. Notably, many subsequent feature-based KD methods outperformed the earliest logit-based KD method and iteratively generated numerous state-of-the-art distillation methods. Nevertheless, recent work has uncovered the potential of the logit-based method, bringing the simple KD form based on logits back into the limelight. Features or logits? They partially implement the KD with entirely distinct perspectives; therefore, choosing between logits and features is not straightforward. This paper provides a unified perspective of feature alignment in order to obtain a better comprehension of their fundamental distinction.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition