Annealing Double-Head: An Architecture for Online Calibration of Deep   Neural Networks

Erdong Guo; David Draper; Maria De Iorio

arXiv:2212.13621·stat.ML·January 18, 2023

Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

Erdong Guo, David Draper, Maria De Iorio

PDF

Open Access

TL;DR

This paper introduces Annealing Double-Head, an architecture that calibrates deep neural networks during training, significantly improving confidence estimates without sacrificing accuracy across various datasets and models.

Contribution

The paper proposes a novel calibration architecture with an annealing technique that enhances DNN calibration during training, outperforming existing methods.

Findings

01

Achieves state-of-the-art calibration performance without post-processing.

02

Maintains comparable predictive accuracy to other calibration methods.

03

Effective under both in-distribution and distributional shift conditions.

Abstract

Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Speech and Audio Processing · Machine Learning and Data Classification