Distilling Calibrated Student from an Uncalibrated Teacher

Ishan Mishra; Sethu Vamsi Krishna; Deepak Mishra

arXiv:2302.11472·cs.CV·February 23, 2023·1 cites

Distilling Calibrated Student from an Uncalibrated Teacher

Ishan Mishra, Sethu Vamsi Krishna, Deepak Mishra

PDF

Open Access

TL;DR

This paper presents a method to produce a calibrated student neural network from an uncalibrated teacher by combining data augmentation with knowledge distillation techniques, ensuring reliable probability estimates without sacrificing accuracy.

Contribution

It introduces a novel framework that distills calibrated students from uncalibrated teachers using data augmentation, applicable to various distillation methods, validated across multiple datasets.

Findings

01

Calibrated students outperform uncalibrated teachers in probability reliability.

02

The approach maintains high accuracy while improving calibration.

03

Robust performance observed on corrupted datasets.

Abstract

Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is critical in high-risk domains. In this paper, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to cutout, mixup, and CutMix, with knowledge distillation. We extend our approach beyond traditional knowledge distillation and find it suitable for Relational Knowledge Distillation and Contrastive Representation Distillation as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications

MethodsCutMix · Knowledge Distillation