Regularizing Class-wise Predictions via Self-knowledge Distillation

Sukmin Yun; Jongjin Park; Kimin Lee; Jinwoo Shin

arXiv:2003.13964·cs.LG·April 8, 2020·30 cites

Regularizing Class-wise Predictions via Self-knowledge Distillation

Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a self-knowledge distillation regularization technique that improves neural network generalization and calibration by penalizing similar samples' predictive distributions, reducing overconfidence and intra-class variation.

Contribution

It proposes a novel class-wise regularization method using self-knowledge distillation to enhance neural network performance and calibration.

Findings

01

Significant improvement in generalization across image classification tasks.

02

Enhanced calibration performance of convolutional neural networks.

03

Reduction in overconfident predictions and intra-class variations.

Abstract

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alinlab/cs-kd
pytorchOfficial

Videos

Regularizing Class-Wise Predictions via Self-Knowledge Distillation· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis