Dynamic Temperature Knowledge Distillation

Yukang Wei; Yu Bai

arXiv:2404.12711·cs.LG·April 22, 2024·2 cites

Dynamic Temperature Knowledge Distillation

Yukang Wei, Yu Bai

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dynamic Temperature Knowledge Distillation (DTKD), a novel method that adaptively adjusts temperature parameters for teacher and student models during training, enhancing knowledge transfer by considering sample difficulty and model output smoothness.

Contribution

The paper proposes a dynamic, cooperative temperature control mechanism using sharpness as a metric, improving upon static temperature methods in knowledge distillation.

Findings

01

DTKD achieves comparable performance to leading KD methods on CIFAR-100 and ImageNet.

02

DTKD demonstrates increased robustness in Target Class KD and None-target Class KD scenarios.

03

The method effectively adapts temperatures based on sample difficulty and output smoothness.

Abstract

Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD). Traditional approaches often employ a static temperature throughout the KD process, which fails to address the nuanced complexities of samples with varying levels of difficulty and overlooks the distinct capabilities of different teacher-student pairings. This leads to a less-than-ideal transfer of knowledge. To improve the process of knowledge propagation, we proposed Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously within each training iterafion. In particular, we proposed "\textbf{sharpness}" as a metric to quantify the smoothness of a model's output distribution. By minimizing the sharpness difference between the teacher and the student, we can derive sample-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JinYu1998/DTKD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsKnowledge Distillation