Distilling Calibrated Student from an Uncalibrated Teacher
Ishan Mishra, Sethu Vamsi Krishna, Deepak Mishra

TL;DR
This paper presents a method to produce a calibrated student neural network from an uncalibrated teacher by combining data augmentation with knowledge distillation techniques, ensuring reliable probability estimates without sacrificing accuracy.
Contribution
It introduces a novel framework that distills calibrated students from uncalibrated teachers using data augmentation, applicable to various distillation methods, validated across multiple datasets.
Findings
Calibrated students outperform uncalibrated teachers in probability reliability.
The approach maintains high accuracy while improving calibration.
Robust performance observed on corrupted datasets.
Abstract
Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is critical in high-risk domains. In this paper, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to cutout, mixup, and CutMix, with knowledge distillation. We extend our approach beyond traditional knowledge distillation and find it suitable for Relational Knowledge Distillation and Contrastive Representation Distillation as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
MethodsCutMix · Knowledge Distillation
