Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models
Aurora Grefsrud, Nello Blaser, Trygve Buanes

TL;DR
This study evaluates six probabilistic machine learning algorithms for binary classification, focusing on their ability to produce well-calibrated and meaningful uncertainty estimates, especially for out-of-distribution data, using synthetic datasets.
Contribution
It provides a comparative analysis of various uncertainty estimation methods in deep learning, highlighting their strengths and limitations in calibration and out-of-distribution detection.
Findings
All algorithms showed good calibration on synthetic data.
Deep learning methods struggled with out-of-distribution uncertainty reflection.
None of the methods consistently detected lack of evidence for OOD data.
Abstract
Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification has become exceedingly difficult and a plethora of techniques have been proposed. In this case study, we use the unifying framework of approximate Bayesian inference combined with empirical tests on carefully created synthetic classification datasets to investigate qualitative properties of six different probabilistic machine learning algorithms for class probability and uncertainty estimation: (i) a neural network ensemble, (ii) neural network ensemble with conflictual loss, (iii) evidential deep learning, (iv) a single neural network with Monte Carlo Dropout, (v) Gaussian process classification and (vi) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning in Materials Science · Probabilistic and Robust Engineering Design
