Uncertainty Quantification Metrics for Deep Regression
Simon Kristoffersson Lind, Ziliang Xiong, Per-Erik Forss\'en, Volker, Kr\"uger

TL;DR
This paper evaluates different metrics for quantifying predictive uncertainty in deep regression models, highlighting Calibration Error as the most stable and interpretable, and recommending specific metrics for practical use.
Contribution
It provides a comprehensive analysis of uncertainty metrics for deep regression, comparing their stability, interpretability, and suitability for different scenarios.
Findings
Calibration Error is the most stable and interpretable metric.
AUSE and NLL have specific useful applications.
Spearman's Rank Correlation is not recommended for uncertainty evaluation.
Abstract
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsFocus
