Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Zhao Zhang, Allan Espinosa, Kamil Iskra, Ioan Raicu, Ian Foster,, Michael Wilde

TL;DR
This paper presents a collective IO model for petascale loosely coupled programming that improves data distribution and collection efficiency, reducing manual tuning and enhancing ease of programming on large-scale systems.
Contribution
The paper introduces a prototype collective IO model that leverages local file systems and broadcast techniques to optimize data handling in loosely coupled petascale applications.
Findings
Achieved high-speed data distribution and collection on Blue Gene/P
Reduced manual tuning in file-based many-task computing
Demonstrated performance improvements with synthetic benchmarks and molecular dynamics application
Abstract
Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
