A Practical Probabilistic Benchmark for AI Weather Models
Noah D. Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, and Boris Bonev, Thorsten Kurth, Dale R. Durran, Peter Harrington, and Michael S. Pritchard

TL;DR
This paper introduces a practical, parameter-free benchmark for evaluating the probabilistic skill of AI weather models using lagged ensembles, revealing insights into model performance and training strategies.
Contribution
It proposes a novel lagged ensemble method for fair probabilistic comparison of AI weather models and analyzes the impact of loss functions and resolution on forecast quality.
Findings
GraphCast and Pangu are tied on probabilistic CRPS.
Multiple time-step loss functions reduce probabilistic skill.
Resolution modulation affects ensemble calibration.
Abstract
Since the weather is chaotic, forecasts aim to predict the distribution of future states rather than make a single prediction. Recently, multiple data driven weather models have emerged claiming breakthroughs in skill. However, these have mostly been benchmarked using deterministic skill scores, and little is known about their probabilistic skill. Unfortunately, it is hard to fairly compare AI weather models in a probabilistic sense, since variations in choice of ensemble initialization, definition of state, and noise injection methodology become confounding. Moreover, even obtaining ensemble forecast baselines is a substantial engineering challenge given the data volumes involved. We sidestep both problems by applying a decades-old idea -- lagged ensembles -- whereby an ensemble can be constructed from a moderately-sized library of deterministic forecasts. This allows the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations
MethodsLib
