Evaluating Extreme Precipitation Forecasts: A Threshold-Weighted, Spatial Verification Approach for Comparing an AI Weather Prediction Model Against a High-Resolution NWP Model

Nicholas Loveday; Tracy Hertneky

arXiv:2510.25045·physics.ao-ph·October 30, 2025

Evaluating Extreme Precipitation Forecasts: A Threshold-Weighted, Spatial Verification Approach for Comparing an AI Weather Prediction Model Against a High-Resolution NWP Model

Nicholas Loveday, Tracy Hertneky

PDF

TL;DR

This paper presents a new evaluation framework combining spatial verification and threshold-weighted scoring to compare AI-based weather forecasts with traditional models, focusing on extreme precipitation events and spatial coherence.

Contribution

The study introduces a flexible, user-oriented verification method that extends existing approaches to better assess extreme weather prediction performance of AIWP and NWP models.

Findings

01

NWP model outperforms AIWP in extreme event prediction at short lead times.

02

Model rankings depend on neighborhood size used in evaluation.

03

NWP has better discrimination ability at short lead times, AIWP slightly better after 24 hours.

Abstract

Recent advances in AI-based weather prediction have led to the development of artificial intelligence weather prediction (AIWP) models with competitive forecast skill compared to traditional NWP models, but with substantially reduced computational cost. There is a strong need for appropriate methods to evaluate their ability to predict extreme weather events, particularly when spatial coherence is important, and grid resolutions differ between models. We introduce a verification framework that combines spatial verification methods and proper scoring rules. Specifically, the framework extends the High-Resolution Assessment (HiRA) approach with threshold-weighted scoring rules. It enables user-oriented evaluation consistent with how forecasts may be interpreted by operational meteorologists or used in simple post-processing systems. The method supports targeted evaluation of extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.