SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth
Nick Masi, Randall Balestriero

TL;DR
SAFE introduces a stratified evaluation framework for weather forecasts, revealing disparities in model performance across geospatial and socio-economic attributes, thus enabling more equitable and location-specific model assessments.
Contribution
It presents a novel stratified assessment package that evaluates weather forecast accuracy across diverse Earth attributes, highlighting performance disparities and promoting fairness in model evaluation.
Findings
Models show significant performance disparities across regions and attributes.
SAFE uncovers biases in existing weather prediction models.
Benchmarking reveals which models are most fair across different strata.
Abstract
The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. This amounts to averaging performance geospatially across the Earth in weather and climate settings, failing to account for the non-uniform distribution of human development and geography. We introduce Stratified Assessments of Forecasts over Earth (SAFE), a package for elucidating the stratified performance of a set of predictions made over Earth. SAFE integrates various data domains to stratify by different attributes associated with geospatial gridpoints: territory (usually country), global subregion, income, and landcover (land or water). This allows us to examine the performance of models for each individual stratum of the different attributes (e.g., the accuracy in every individual country). To demonstrate its importance, we utilize SAFE to benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
