PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs
Mihael Hategan-Marandiuc, Andre Merzky, Nicholson Collier, Ketan, Maheshwari, Jonathan Ozik, Matteo Turilli, Andreas Wilke, Justin M. Wozniak,, Kyle Chard, Ian Foster, Rafael Ferreira da Silva, Shantenu Jha, Daniel Laney

TL;DR
PSI/J is a portable API designed to simplify job submission and management across diverse HPC schedulers, enhancing portability and reducing overhead for scientific applications.
Contribution
It introduces a novel, scheduler-agnostic job management API, filling a gap where no current viable solution exists for HPC portability.
Findings
PSI/J integrates with multiple workflow systems and applications.
Experiments show PSI/J has minimal overhead.
Enhances portability of HPC applications across different schedulers.
Abstract
It is generally desirable for high-performance computing (HPC) applications to be portable between HPC systems, for example to make use of more performant hardware, make effective use of allocations, and to co-locate compute jobs with large datasets. Unfortunately, moving scientific applications between HPC systems is challenging for various reasons, most notably that HPC systems have different HPC schedulers. We introduce PSI/J, a job management abstraction API intended to simplify the construction of software components and applications that are portable over various HPC scheduler implementations. We argue that such a system is both necessary and that no viable alternative currently exists. We analyze similar notable APIs and attempt to determine the factors that influenced their evolution and adoption by the HPC community. We base the design of PSI/J on that analysis. We describe how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management
