An Assessment of Data Transfer Performance for Large-Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6
Eli Dart, Michael F. Wehner, Prabhat

TL;DR
This paper analyzes the data transfer performance of large-scale climate data from the CMIP5 archive to supercomputing centers, highlighting current challenges and proposing practical improvements for efficient data handling in future CMIP6 projects.
Contribution
It provides an empirical assessment of data transfer workflows and performance, and offers actionable recommendations to enhance data infrastructure for climate research.
Findings
Data transfer rates are often slower than residential internet speeds.
Significant performance improvements are achievable with current best practices.
Recommendations include adopting Science DMZ models and establishing performance metrics.
Abstract
We document the data transfer workflow, data transfer performance, and other aspects of staging approximately 56 terabytes of climate model output data from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at the Lawrence Berkeley National Laboratory required for tracking and characterizing extratropical storms, a phenomena of importance in the mid-latitudes. We present this analysis to illustrate the current challenges in assembling multi-model data sets at major computing facilities for large-scale studies of CMIP5 data. Because of the larger archive size of the upcoming CMIP6 phase of model intercomparison, we expect such data transfers to become of increasing importance, and perhaps of routine necessity. We find that data transfer rates using the ESGF are often slower than what is typically available…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate variability and models · Meteorological Phenomena and Simulations · Tropical and Extratropical Cyclones Research
