NormKD: Normalized Logits for Knowledge Distillation

Zhihao Chi; Tu Zheng; Hengjia Li; Zheng Yang; Boxi Wu; Binbin Lin,; Deng Cai

arXiv:2308.00520·cs.CV·August 2, 2023·6 cites

NormKD: Normalized Logits for Knowledge Distillation

Zhihao Chi, Tu Zheng, Hengjia Li, Zheng Yang, Boxi Wu, Binbin Lin,, Deng Cai

PDF

Open Access 1 Repo

TL;DR

NormKD introduces a sample-specific temperature adjustment in logit-based knowledge distillation, significantly improving performance on image classification tasks without extra computational costs.

Contribution

The paper proposes Normalized Knowledge Distillation (NormKD), a novel method that customizes the temperature for each sample based on its logit distribution, enhancing distillation effectiveness.

Findings

01

Significantly better performance on CIRAR-100 and ImageNet.

02

Comparable or superior results to feature-based methods.

03

No extra computational or storage costs.

Abstract

Logit based knowledge distillation gets less attention in recent years since feature based methods perform better in most cases. Nevertheless, we find it still has untapped potential when we re-investigate the temperature, which is a crucial hyper-parameter to soften the logit outputs. For most of the previous works, it was set as a fixed value for the entire distillation procedure. However, as the logits from different samples are distributed quite variously, it is not feasible to soften all of them to an equal degree by just a single temperature, which may make the previous work transfer the knowledge of each sample inadequately. In this paper, we restudy the hyper-parameter temperature and figure out its incapability to distill the knowledge from each sample sufficiently when it is a single value. To address this issue, we propose Normalized Knowledge Distillation (NormKD), with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gizi1/NormKD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation