From Knowledge Distillation to Self-Knowledge Distillation: A Unified   Approach with Normalized Loss and Customized Soft Labels

Zhendong Yang; Ailing Zeng; Zhe Li; Tianke Zhang; Chun Yuan; Yu Li

arXiv:2303.13005·cs.CV·July 18, 2023·5 cites

From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

Zhendong Yang, Ailing Zeng, Zhe Li, Tianke Zhang, Chun Yuan, Yu Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified framework for knowledge distillation and self-knowledge distillation using normalized loss and customized soft labels, achieving state-of-the-art results on multiple datasets.

Contribution

It proposes a unified formulation for KD and self-KD, introducing NKD and USKD methods that improve distillation effectiveness and are applicable to various model architectures.

Findings

01

NKD achieves state-of-the-art performance on CIFAR-100 and ImageNet.

02

USKD effectively applies to CNN and ViT models with negligible additional cost.

03

USKD yields significant accuracy gains on ImageNet for MobileNet and DeiT-Tiny.

Abstract

Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image's category) and non-target classes named Universal Self-Knowledge Distillation (USKD). We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two non-target logits is different, preventing them from being identical. NKD normalizes the non-target logits to equalize their sum. It can be generally used for KD and self-KD to better use the soft labels for distillation loss. USKD generates customized soft labels for both target and non-target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yzd-v/cls_KD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification