Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
Lukasz Lacinski, Lee Liming, Steven Turoscy, Cameron Harr and, Kyle Chard, Eli Dart, Paul Durack, Sasha Ames, Forrest M. Hoffman, and Ian T. Foster

TL;DR
This paper describes a successful automated replication of 7.3 petabytes of climate data across multiple national labs, demonstrating the effectiveness of a simple, Globus-based tool for large-scale, reliable data transfer.
Contribution
It introduces an automated replication process using Globus that efficiently transfers massive climate datasets across multiple sites with high reliability.
Findings
Replicated 7.3 PB of climate data across three labs.
Achieved high transfer efficiency using Globus and ESnet.
Demonstrated reliable recovery from transient failures.
Abstract
We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and ORNL, was performed largely automatically by a simple replication tool, a script that invoked Globus to transfer large bundles of files while tracking progress in a database. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAtmospheric and Environmental Gas Dynamics · Meteorological Phenomena and Simulations · Climate variability and models
