Exploring Design Choices for Autoregressive Deep Learning Climate Models
Florian Gallusser, Simon Hentschel, Anna Krause, Andreas Hotho

TL;DR
This paper compares the long-term stability of three deep learning atmospheric models, identifying key design choices that enable stable decade-long climate simulations while maintaining data consistency.
Contribution
It provides a systematic analysis of model architecture, training, and variables influencing the long-term stability of DL-based climate models.
Findings
SFNO shows highest robustness to hyperparameters
All models can become unstable depending on seed and variables
Certain configurations enable 10-year stable rollouts
Abstract
Deep Learning models have achieved state-of-the-art performance in medium-range weather prediction but often fail to maintain physically consistent rollouts beyond 14 days. In contrast, a few atmospheric models demonstrate stability over decades, though the key design choices enabling this remain unclear. This study quantitatively compares the long-term stability of three prominent DL-MWP architectures - FourCastNet, SFNO, and ClimaX - trained on ERA5 reanalysis data at 5.625{\deg} resolution. We systematically assess the impact of autoregressive training steps, model capacity, and choice of prognostic variables, identifying configurations that enable stable 10-year rollouts while preserving the statistical properties of the reference dataset. Notably, rollouts with SFNO exhibit the greatest robustness to hyperparameter choices, yet all models can experience instability depending on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsSparse Evolutionary Training
