Probabilistic measures afford fair comparisons of AIWP and NWP model output
Tilmann Gneiting, Tobias Biegert, Kristof Kraus, Eva-Maria Walz, Alexander I. Jordan, Sebastian Lerch

TL;DR
The paper introduces the potential continuous ranked probability score (PC), a new measure for fair comparison of AI and NWP weather prediction models, demonstrating its effectiveness on WeatherBench 2 data.
Contribution
It proposes a novel, invariant measure (PC) for comparing deterministic model outputs after statistical postprocessing, enabling fair and meaningful model evaluations.
Findings
GraphCast outperforms ECMWF HRES in PC measure.
PC aligns well with operational ensemble CRPS.
The method enables fair comparisons without predefined loss functions.
Abstract
We introduce a new measure for fair and meaningful comparisons of single-valued output from artificial intelligence based weather prediction (AIWP) and numerical weather prediction (NWP) models, called potential continuous ranked probability score (PC). In a nutshell, we subject the deterministic backbone of physics-based and data-driven models post hoc to the same statistical postprocessing technique, namely, isotonic distributional regression (IDR). Then we find PC as the mean continuous ranked probability score (CRPS) of the postprocessed probabilistic forecasts. The nonnegative PC measure quantifies potential predictive performance and is invariant under strictly increasing transformations of the model output. PC attains its most desirable value of zero if, and only if, the weather outcome Y is a fixed, non-decreasing function of the model output X. The PC measure is recorded in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Climate variability and models · Precipitation Measurement and Analysis
MethodsHigh-Order Consensuses
