Comparing the quality of neural network uncertainty estimates for classification problems
Daniel Ries, Joshua Michalenko, Tyler Ganter, Rashad Imad-Fayez, Baiyasi, Jason Adams

TL;DR
This paper evaluates and compares various uncertainty quantification methods for deep learning classifiers, highlighting the inconsistency among methods and emphasizing the need for principled quality assessment metrics.
Contribution
It introduces a framework for evaluating UQ methods in deep learning, comparing multiple approaches using statistical metrics on real and simulated data.
Findings
MCMC Bayesian neural networks perform best overall.
Bootstrapped neural networks are a close second in quality.
Different UQ methods can produce markedly different uncertainty estimates.
Abstract
Traditional deep learning (DL) models are powerful classifiers, but many approaches do not provide uncertainties for their estimates. Uncertainty quantification (UQ) methods for DL models have received increased attention in the literature due to their usefulness in decision making, particularly for high-consequence decisions. However, there has been little research done on how to evaluate the quality of such methods. We use statistical methods of frequentist interval coverage and interval width to evaluate the quality of credible intervals, and expected calibration error to evaluate classification predicted confidence. These metrics are evaluated on Bayesian neural networks (BNN) fit using Markov Chain Monte Carlo (MCMC) and variational inference (VI), bootstrapped neural networks (NN), Deep Ensembles (DE), and Monte Carlo (MC) dropout. We apply these different UQ for DL methods to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Fault Detection and Control Systems · Advanced Statistical Methods and Models
MethodsVariational Inference · Deep Ensembles
