Recipes for Calibration Checks in Safety-Critical Applications

Romeo Valentin

arXiv:2604.26479·stat.ME·April 30, 2026

Recipes for Calibration Checks in Safety-Critical Applications

Romeo Valentin

PDF

TL;DR

This paper introduces a modular framework for calibration checks of probabilistic forecasts in safety-critical systems, enabling reliable validation of distributional assumptions with operationally friendly decision criteria.

Contribution

It proposes a flexible, step-by-step calibration testing pipeline that supports various use-cases and includes modifications for safety-critical applications, demonstrated on weather and robotics tasks.

Findings

01

Framework supports single accept/reject decision for calibration validation.

02

Modifications reject overconfident predictions and tolerate small deviations.

03

Demonstrated on weather forecasting and robot pose estimation.

Abstract

Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and robustness needs to be validated and certified. Often, only accuracy -- the mean of the predictions -- is evaluated against true outcomes. However, for safety-critical scenarios and decision making under uncertainty, the full distributional properties of the forecasts should be checked: do the observed prediction errors actually follow the forecasted probability distributions? To this end, we introduce a framework for calibration checks: statistical tests that validate distributional properties of forecasts when measured over many samples. In order to support ease-of-use in real-world operations, these checks produce a single accept/reject decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.