DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren

TL;DR
DeepKD is a novel knowledge distillation framework that employs dual-level decoupling and adaptive denoising, improving transfer quality by addressing gradient conflicts and noisy signals, validated on multiple datasets.
Contribution
It introduces a dual-level decoupling approach with independent momentum updates and a dynamic top-k mask for denoising, enhancing knowledge transfer effectiveness.
Findings
Improves accuracy on CIFAR-100, ImageNet, and MS-COCO datasets.
Effectively reduces noise from dark knowledge during training.
Demonstrates superior performance over existing methods.
Abstract
Recent advances in knowledge distillation have emphasized the importance of decoupling different knowledge components. While existing methods utilize momentum mechanisms to separate task-oriented and distillation gradients, they overlook the inherent conflict between target-class and non-target-class knowledge flows. Furthermore, low-confidence dark knowledge in non-target classes introduces noisy signals that hinder effective knowledge transfer. To address these limitations, we propose DeepKD, a novel training framework that integrates dual-level decoupling with adaptive denoising. First, through theoretical analysis of gradient signal-to-noise ratio (GSNR) characteristics in task-oriented and non-task-oriented knowledge distillation, we design independent momentum updaters for each component to prevent mutual interference. We observe that the optimal momentum coefficients for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Big Data Technologies and Applications · Data Mining Algorithms and Applications
MethodsKnowledge Distillation
