ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
Julia Kaltenborn, Charlotte E. E. Lange, Venkatesh Ramesh, Philippe, Brouillard, Yaniv Gurwicz, Chandni Nagda, Jakob Runge, Peer Nowack, David, Rolnick

TL;DR
ClimateSet is a comprehensive, large-scale climate model dataset designed to support machine learning tasks like emulation and prediction, enabling scalable climate analysis and policy support.
Contribution
We introduce ClimateSet, a large, ML-ready climate dataset with a modular pipeline, and demonstrate its use as a benchmark for climate model emulation.
Findings
ML models' performance varies across different climate models
A super emulator trained on multiple models can efficiently project new scenarios
ClimateSet facilitates scalable ML applications in climate science
Abstract
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Climate variability and models · Meteorological Phenomena and Simulations
