DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer

Haiduo Huang; Jiangcheng Song; Yadong Zhang; Pengju Ren

arXiv:2505.15133·cs.CV·May 22, 2025

DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer

Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren

PDF

Open Access 1 Repo

TL;DR

DeepKD is a novel knowledge distillation framework that employs dual-level decoupling and adaptive denoising, improving transfer quality by addressing gradient conflicts and noisy signals, validated on multiple datasets.

Contribution

It introduces a dual-level decoupling approach with independent momentum updates and a dynamic top-k mask for denoising, enhancing knowledge transfer effectiveness.

Findings

01

Improves accuracy on CIFAR-100, ImageNet, and MS-COCO datasets.

02

Effectively reduces noise from dark knowledge during training.

03

Demonstrates superior performance over existing methods.

Abstract

Recent advances in knowledge distillation have emphasized the importance of decoupling different knowledge components. While existing methods utilize momentum mechanisms to separate task-oriented and distillation gradients, they overlook the inherent conflict between target-class and non-target-class knowledge flows. Furthermore, low-confidence dark knowledge in non-target classes introduces noisy signals that hinder effective knowledge transfer. To address these limitations, we propose DeepKD, a novel training framework that integrates dual-level decoupling with adaptive denoising. First, through theoretical analysis of gradient signal-to-noise ratio (GSNR) characteristics in task-oriented and non-task-oriented knowledge distillation, we design independent momentum updaters for each component to prevent mutual interference. We observe that the optimal momentum coefficients for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haiduo/deepkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Big Data Technologies and Applications · Data Mining Algorithms and Applications

MethodsKnowledge Distillation