On Calibration of Modern Neural Networks
Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger

TL;DR
This paper investigates the calibration of modern neural networks, revealing they are often poorly calibrated and demonstrating that simple methods like temperature scaling can effectively improve their probability estimates.
Contribution
It provides a comprehensive analysis of factors affecting calibration and shows that temperature scaling is a simple yet effective calibration method for current neural architectures.
Findings
Modern neural networks are poorly calibrated.
Factors like depth, width, weight decay, and BatchNorm influence calibration.
Temperature scaling effectively improves calibration across datasets.
Abstract
Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
