Pilot-Data: An Abstraction for Distributed Data
Andre Luckow, Mark Santcroos, Ashley Zebrowski, Shantenu Jha

TL;DR
Pilot-Data introduces an abstraction that enhances the management and scheduling of large-scale data alongside compute resources in distributed environments, addressing interoperability and extensibility challenges.
Contribution
It extends the Pilot-Job abstraction to include data management, enabling efficient data and compute co-placement and scheduling in heterogeneous distributed systems.
Findings
Demonstrated improved application performance with Pilot-Data integration.
Showcased flexible execution modes for data-intensive applications.
Enabled advanced data-compute co-placement and scheduling capabilities.
Abstract
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, controlling co-placement and scheduling of data with compute resources, and storing, transferring, and managing large volumes of data. Although there exist multiple approaches to addressing each of these challenges, an integrative approach is missing; furthermore, extending existing functionality or enabling interoperable capabilities remains difficult at best. We propose the concept of Pilot-Data to address the fundamental challenges of co-placement and scheduling of data and compute in heterogeneous and distributed environments with interoperability and extensibility as first-order concerns. Pilot-Data is an extension of the Pilot-Job abstraction for supporting the management of data in conjunction with compute tasks. Pilot-Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
