SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth

Nick Masi; Randall Balestriero

arXiv:2510.26099·cs.LG·October 31, 2025

SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth

Nick Masi, Randall Balestriero

PDF

TL;DR

SAFE introduces a stratified evaluation framework for weather forecasts, revealing disparities in model performance across geospatial and socio-economic attributes, thus enabling more equitable and location-specific model assessments.

Contribution

It presents a novel stratified assessment package that evaluates weather forecast accuracy across diverse Earth attributes, highlighting performance disparities and promoting fairness in model evaluation.

Findings

01

Models show significant performance disparities across regions and attributes.

02

SAFE uncovers biases in existing weather prediction models.

03

Benchmarking reveals which models are most fair across different strata.

Abstract

The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. This amounts to averaging performance geospatially across the Earth in weather and climate settings, failing to account for the non-uniform distribution of human development and geography. We introduce Stratified Assessments of Forecasts over Earth (SAFE), a package for elucidating the stratified performance of a set of predictions made over Earth. SAFE integrates various data domains to stratify by different attributes associated with geospatial gridpoints: territory (usually country), global subregion, income, and landcover (land or water). This allows us to examine the performance of models for each individual stratum of the different attributes (e.g., the accuracy in every individual country). To demonstrate its importance, we utilize SAFE to benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.