Regularizing Class-wise Predictions via Self-knowledge Distillation
Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin

TL;DR
This paper introduces a self-knowledge distillation regularization technique that improves neural network generalization and calibration by penalizing similar samples' predictive distributions, reducing overconfidence and intra-class variation.
Contribution
It proposes a novel class-wise regularization method using self-knowledge distillation to enhance neural network performance and calibration.
Findings
Significant improvement in generalization across image classification tasks.
Enhanced calibration performance of convolutional neural networks.
Reduction in overconfident predictions and intra-class variations.
Abstract
Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Regularizing Class-Wise Predictions via Self-Knowledge Distillation· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
