Negative impact of heavy-tailed uncertainty and error distributions on   the reliability of calibration statistics for machine learning regression   tasks

Pascal Pernot

arXiv:2402.10043·stat.ML·August 20, 2024·2 cites

Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks

Pascal Pernot

PDF

Open Access 1 Repo

TL;DR

This paper investigates how heavy-tailed distributions affect the reliability of calibration statistics in machine learning regression, highlighting the robustness of ZMS over other methods and proposing solutions for improved calibration assessment.

Contribution

It reveals the unreliability of variance-based calibration errors under heavy-tailed distributions and advocates for the use of ZMS, introducing solutions to enhance calibration reliability.

Findings

01

Variance-based calibration errors become unreliable with heavy tails.

02

ZMS provides a more robust calibration assessment in such cases.

03

Heavy-tailed distributions can compromise calibration metrics like ENCE.

Abstract

Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and the mean variance (MV); the alternative is to compare the mean squared z-scores (ZMS) to 1. The problem is that both approaches might lead to different conclusions, as illustrated in this study for an ensemble of datasets from the recent machine learning uncertainty quantification (ML-UQ) literature. It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions, which seems to be a frequent feature of ML-UQ datasets. By contrast, the ZMS statistic is less sensitive and offers the most reliable approach in this context, still acknowledging that datasets with heavy-tailed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ppernot/2024_rce
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Fault Detection and Control Systems · Advanced Statistical Methods and Models