# Decoupled Classifier Knowledge Distillation

**Authors:** Hairui Wang, Mengjie Dong, Guifu Zhu, Ya Li

PMC · DOI: 10.1371/journal.pone.0314267 · PLOS ONE · 2025-02-21

## TL;DR

This paper introduces a new knowledge distillation method that decouples classifier outputs to improve efficiency and performance in image classification and object detection.

## Contribution

The novel approach decouples classifier outputs to combine distillation methods without redundancy, enhancing training efficiency and performance.

## Key findings

- DCKD achieves superior results on CIFAR-100 and ImageNet datasets for image classification and object detection.
- The method allows relational-based and feature-based distillation to work more efficiently and flexibly.
- Fixing correct knowledge and aligning outputs improves performance without reducing training efficiency.

## Abstract

Mainstream knowledge distillation methods primarily include self-distillation, offline distillation, online distillation, output-based distillation, and feature-based distillation. While each approach has its respective advantages, they are typically employed independently. Simply combining two distillation methods often leads to redundant information. If the information conveyed by both methods is highly similar, this can result in wasted computational resources and increased complexity. To provide a new perspective on distillation research, we aim to explore a compromise solution that aligns complex features without conflicting with output alignment. In this work, we propose to decouple the classifier’s output into two components: non-target classes learned by the student, and target classes obtained by both the teacher and the student. Finally, we introduce Decoupled Classifier Knowledge Distillation (DCKD), where on one hand, we fix the correct knowledge that the student has already acquired, which is crucial for merging the two methods; on the other hand, we encourage the student to further align its output with that of the teacher. Compared to using a single method, DCKD achieves superior results on both the CIFAR-100 and ImageNet datasets for image classification and object detection tasks, without reducing training efficiency. Moreover, it allows relational-based and feature-based distillation to operate more efficiently and flexibly. This work demonstrates the great potential of integrating distillation methods, and we hope it will inspire future research.

## Full-text entities

- **Genes:** PRKD1 (protein kinase D1) [NCBI Gene 5587] {aka CHDED, PKC-MU, PKCM, PKD, PKD1, PRKCM}
- **Diseases:** repetitive learning (MESH:D007859), SKD (MESH:D060050)
- **Chemicals:** KD (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11844843/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11844843/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC11844843/full.md

---
Source: https://tomesphere.com/paper/PMC11844843