A PMP-inspired Evaluation Framework for Assessing Deep-Learning Earth System Models
Giuliana Pallotta, Shiheng Duan, C\'eline Bonfils, Jiwoo Lee, Seth Goodnight, Paul Ullrich

TL;DR
This paper introduces an evaluation framework for Deep-Learning Earth System Models using standardized diagnostics, enabling comprehensive assessment of their climate simulation capabilities and identifying strengths and challenges.
Contribution
It presents a novel evaluation framework applying traditional climate diagnostics to DL-ESMs, facilitating their assessment for climate research and development.
Findings
DL-ESMs show strengths in large-scale climate features
Persistent challenges remain in precipitation and tropical variability
Framework extends assessment beyond short-range forecast skill
Abstract
In recent years, Deep-Learning Earth System Models (DL-ESMs) have emerged as promising, computationally efficient complements to traditional Earth system models. Here, we present an evaluation framework for testing DL-ESMs from a climate-model-development perspective using standardized diagnostics from the PCMDI Metrics Package (PMP). This framework allows DL-ESMs, including Ai2's ACE2 and Google's NeuralGCM, to be assessed with metrics that quantify their ability to reproduce climatology, major modes of variability, monsoon behavior, and precipitation variability relative to observational reference datasets and CMIP-class benchmarks. By evaluating DL-ESMs with tools commonly used for traditional models, we extend their assessment beyond short-range forecast skill and toward climate-relevant applications. The results identify encouraging strengths in several large-scale fields and modes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
