A PMP-inspired Evaluation Framework for Assessing Deep-Learning Earth System Models

Giuliana Pallotta; Shiheng Duan; C\'eline Bonfils; Jiwoo Lee; Seth Goodnight; Paul Ullrich

arXiv:2604.06567·physics.ao-ph·May 20, 2026

A PMP-inspired Evaluation Framework for Assessing Deep-Learning Earth System Models

Giuliana Pallotta, Shiheng Duan, C\'eline Bonfils, Jiwoo Lee, Seth Goodnight, Paul Ullrich

PDF

TL;DR

This paper introduces an evaluation framework for Deep-Learning Earth System Models using standardized diagnostics, enabling comprehensive assessment of their climate simulation capabilities and identifying strengths and challenges.

Contribution

It presents a novel evaluation framework applying traditional climate diagnostics to DL-ESMs, facilitating their assessment for climate research and development.

Findings

01

DL-ESMs show strengths in large-scale climate features

02

Persistent challenges remain in precipitation and tropical variability

03

Framework extends assessment beyond short-range forecast skill

Abstract

In recent years, Deep-Learning Earth System Models (DL-ESMs) have emerged as promising, computationally efficient complements to traditional Earth system models. Here, we present an evaluation framework for testing DL-ESMs from a climate-model-development perspective using standardized diagnostics from the PCMDI Metrics Package (PMP). This framework allows DL-ESMs, including Ai2's ACE2 and Google's NeuralGCM, to be assessed with metrics that quantify their ability to reproduce climatology, major modes of variability, monsoon behavior, and precipitation variability relative to observational reference datasets and CMIP-class benchmarks. By evaluating DL-ESMs with tools commonly used for traditional models, we extend their assessment beyond short-range forecast skill and toward climate-relevant applications. The results identify encouraging strengths in several large-scale fields and modes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.