Dynamic Temperature Knowledge Distillation
Yukang Wei, Yu Bai

TL;DR
This paper introduces Dynamic Temperature Knowledge Distillation (DTKD), a novel method that adaptively adjusts temperature parameters for teacher and student models during training, enhancing knowledge transfer by considering sample difficulty and model output smoothness.
Contribution
The paper proposes a dynamic, cooperative temperature control mechanism using sharpness as a metric, improving upon static temperature methods in knowledge distillation.
Findings
DTKD achieves comparable performance to leading KD methods on CIFAR-100 and ImageNet.
DTKD demonstrates increased robustness in Target Class KD and None-target Class KD scenarios.
The method effectively adapts temperatures based on sample difficulty and output smoothness.
Abstract
Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD). Traditional approaches often employ a static temperature throughout the KD process, which fails to address the nuanced complexities of samples with varying levels of difficulty and overlooks the distinct capabilities of different teacher-student pairings. This leads to a less-than-ideal transfer of knowledge. To improve the process of knowledge propagation, we proposed Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously within each training iterafion. In particular, we proposed "\textbf{sharpness}" as a metric to quantify the smoothness of a model's output distribution. By minimizing the sharpness difference between the teacher and the student, we can derive sample-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsKnowledge Distillation
