Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks
Pascal Pernot

TL;DR
This paper investigates how heavy-tailed distributions affect the reliability of calibration statistics in machine learning regression, highlighting the robustness of ZMS over other methods and proposing solutions for improved calibration assessment.
Contribution
It reveals the unreliability of variance-based calibration errors under heavy-tailed distributions and advocates for the use of ZMS, introducing solutions to enhance calibration reliability.
Findings
Variance-based calibration errors become unreliable with heavy tails.
ZMS provides a more robust calibration assessment in such cases.
Heavy-tailed distributions can compromise calibration metrics like ENCE.
Abstract
Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and the mean variance (MV); the alternative is to compare the mean squared z-scores (ZMS) to 1. The problem is that both approaches might lead to different conclusions, as illustrated in this study for an ensemble of datasets from the recent machine learning uncertainty quantification (ML-UQ) literature. It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions, which seems to be a frequent feature of ML-UQ datasets. By contrast, the ZMS statistic is less sensitive and offers the most reliable approach in this context, still acknowledging that datasets with heavy-tailed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Fault Detection and Control Systems · Advanced Statistical Methods and Models
