Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
Huan Zhou, Kamran Idrees, Jos\'e Gracia

TL;DR
This paper introduces an optimized PGAS runtime system leveraging MPI-3 shared-memory extensions to improve intra-node communication efficiency, achieving performance comparable to low-level RMA libraries and MPI-3 inter-node communication.
Contribution
The paper presents a novel hybrid runtime system that combines MPI-3 shared-memory extensions with MPI-3 one-sided communication for efficient intra- and inter-node communication.
Findings
Intra-node communication performance matches low-level RMA libraries.
Inter-node communication performance matches MPI-3 standards.
Hybrid runtime system improves PGAS communication efficiency.
Abstract
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
