Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations
Frank W\"urthwein, Jonathan Guiang, Aashay Arora, Diego Davila, John, Graham, Dima Mishin, Thomas Hutton, Igor Sfiligoi, Harvey Newman, Justas, Balcas, Tom Lehman, Xi Yang, Chin Guok

TL;DR
This paper proposes a managed network service solution for large-scale scientific collaborations to optimize data movement at exascale, demonstrated with CERN's LHC, aiming to reduce storage over-provisioning and improve workflow efficiency.
Contribution
It introduces a novel integrated co-scheduling approach for networks within high-level workflows, addressing uncontrolled bandwidth competition in global scientific data infrastructures.
Findings
Co-scheduling reduces storage needs by over an order of magnitude.
Demonstrated functionality in the context of CERN's LHC.
Next steps outlined for production deployment.
Abstract
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities. There is thus uncontrolled competition for bandwidth between and within collaborations. As a result, data "hogs" disk space at processing facilities for much longer than it takes to process, leading to vastly over-provisioned storage infrastructures. Integrated co-scheduling of networks as part of high-level managed workflows might reduce these storage needs by more than an order of magnitude. This paper describes such a solution, demonstrates its functionality in the context of the Large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
