Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Penghui Yang; Chen-Chen Zong; Sheng-Jun Huang; Lei Feng; Bo An

arXiv:2411.08937·cs.CV·April 8, 2026

Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An

PDF

1 Repo

TL;DR

The paper introduces dual-head knowledge distillation, which separates classifier heads to effectively utilize logits and probability information, improving performance over existing methods.

Contribution

It proposes a novel dual-head architecture that preserves the benefits of logit and probability losses while avoiding their conflicts, backed by theoretical analysis.

Findings

01

The method outperforms state-of-the-art distillation techniques.

02

The dual-head approach prevents classifier collapse caused by conflicting losses.

03

Extensive experiments validate the effectiveness of the proposed method.

Abstract

Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would obscure certain indispensable information. To address this issue, it is intuitive to additionally introduce a logit-level loss function as a supplement to the widely used probability-level loss function, for exploiting the latent information of logits. Unfortunately, we empirically find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration, even trailing behind the performance of employing either loss in isolation. We attribute this phenomenon to the collapse of the classification head, which is verified by our theoretical analysis based on the neural collapse theory.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

penghui-yang/DHKD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.