MPI Windows on Storage for HPC Applications
Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai, Narasimhamurthy, Erwin Laure, Stefano Markidis

TL;DR
This paper introduces MPI storage windows as a unified interface for programming memory and storage in HPC, enabling efficient out-of-core execution, parallel I/O, and fault-tolerance with acceptable performance trade-offs.
Contribution
It presents the design, implementation, and evaluation of MPI storage windows, integrating heterogeneous memory and storage for HPC applications.
Findings
MPI windows on local storage incur a 55% performance penalty on average.
Using Lustre parallel file system causes over 90% degradation in write operations.
Real-world applications show the penalty of MPI windows on storage can be negligible.
Abstract
Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we explore the integration of heterogeneous window allocations, where memory and storage share a unified virtual address space. When performing large, irregular memory operations, we verify that MPI windows on local storage incurs a 55% performance penalty on average. When using a Lustre parallel file system, asymmetric performance is observed with over 90% degradation in writing operations. Nonetheless, experimental results of a Distributed Hash Table, the HACC I/O kernel mini-application, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
