Recipes for Calibration Checks in Safety-Critical Applications
Romeo Valentin

TL;DR
This paper introduces a modular framework for calibration checks of probabilistic forecasts in safety-critical systems, enabling reliable validation of distributional assumptions with operationally friendly decision criteria.
Contribution
It proposes a flexible, step-by-step calibration testing pipeline that supports various use-cases and includes modifications for safety-critical applications, demonstrated on weather and robotics tasks.
Findings
Framework supports single accept/reject decision for calibration validation.
Modifications reject overconfident predictions and tolerate small deviations.
Demonstrated on weather forecasting and robot pose estimation.
Abstract
Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and robustness needs to be validated and certified. Often, only accuracy -- the mean of the predictions -- is evaluated against true outcomes. However, for safety-critical scenarios and decision making under uncertainty, the full distributional properties of the forecasts should be checked: do the observed prediction errors actually follow the forecasted probability distributions? To this end, we introduce a framework for calibration checks: statistical tests that validate distributional properties of forecasts when measured over many samples. In order to support ease-of-use in real-world operations, these checks produce a single accept/reject decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
