How to Evaluate Uncertainty Estimates in Machine Learning for Regression?
Laurens Sluijterman, Eric Cator, Tom Heskes

TL;DR
This paper critically examines current methods for evaluating uncertainty estimates in regression neural networks, revealing fundamental flaws and proposing a simulation-based testing approach for more reliable assessment.
Contribution
It identifies key flaws in existing evaluation metrics for uncertainty estimates and introduces a novel simulation-based testing method to improve assessment accuracy.
Findings
Current evaluation methods cannot disentangle components of predictive uncertainty.
Better loglikelihood does not necessarily mean better prediction intervals.
Testing prediction intervals on a single test set is fundamentally flawed.
Abstract
As neural networks become more popular, the need for accompanying uncertainty estimates increases. There are currently two main approaches to test the quality of these estimates. Most methods output a density. They can be compared by evaluating their loglikelihood on a test set. Other methods output a prediction interval directly. These methods are often tested by examining the fraction of test points that fall inside the corresponding prediction intervals. Intuitively both approaches seem logical. However, we demonstrate through both theoretical arguments and simulations that both ways of evaluating the quality of uncertainty estimates have serious flaws. Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty, making it difficult to evaluate the quality of the estimates of these components. Secondly, a better loglikelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
