Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Erdong Guo, David Draper, Maria De Iorio

TL;DR
This paper introduces Annealing Double-Head, an architecture that calibrates deep neural networks during training, significantly improving confidence estimates without sacrificing accuracy across various datasets and models.
Contribution
The paper proposes a novel calibration architecture with an annealing technique that enhances DNN calibration during training, outperforming existing methods.
Findings
Achieves state-of-the-art calibration performance without post-processing.
Maintains comparable predictive accuracy to other calibration methods.
Effective under both in-distribution and distributional shift conditions.
Abstract
Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Speech and Audio Processing · Machine Learning and Data Classification
