Uncertainty Quantification Metrics for Deep Regression

Simon Kristoffersson Lind; Ziliang Xiong; Per-Erik Forss\'en; Volker; Kr\"uger

arXiv:2405.04278·cs.LG·October 30, 2024

Uncertainty Quantification Metrics for Deep Regression

Simon Kristoffersson Lind, Ziliang Xiong, Per-Erik Forss\'en, Volker, Kr\"uger

PDF

Open Access

TL;DR

This paper evaluates different metrics for quantifying predictive uncertainty in deep regression models, highlighting Calibration Error as the most stable and interpretable, and recommending specific metrics for practical use.

Contribution

It provides a comprehensive analysis of uncertainty metrics for deep regression, comparing their stability, interpretability, and suitability for different scenarios.

Findings

01

Calibration Error is the most stable and interpretable metric.

02

AUSE and NLL have specific useful applications.

03

Spearman's Rank Correlation is not recommended for uncertainty evaluation.

Abstract

When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems

MethodsFocus