Knowledge Diffusion for Distillation
Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian,, Chang Xu

TL;DR
DiffKD introduces a diffusion-based method to explicitly denoise student features in knowledge distillation, effectively reducing noise and improving performance across multiple vision tasks.
Contribution
The paper proposes a novel diffusion model approach for knowledge distillation that explicitly denoises student features, outperforming existing methods.
Findings
Achieves state-of-the-art results on image classification, object detection, and segmentation.
Effective across various feature types and model capacities.
Reduces computational cost with a lightweight diffusion model.
Abstract
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models. Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model. To address this, we propose to denoise student features using a diffusion model trained by teacher features. This allows us to perform better distillation between the refined clean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsDiffusion · Knowledge Distillation
