Asynchronous MPI for the Masses
Markus Wittmann, Georg Hager, Thomas Zeiser, Gerhard Wellein

TL;DR
This paper introduces a simple library that enables truly asynchronous non-blocking MPI operations across various MPI implementations, improving communication efficiency and flexibility in high-performance computing environments.
Contribution
It provides a portable, implementation-independent library for asynchronous MPI communication using PMPI and MPI_THREAD_MULTIPLE, compatible with multiple MPI distributions.
Findings
Performance improvements on commodity InfiniBand clusters.
Compatibility with multiple MPI implementations.
Insights into thread placement and MPI library support.
Abstract
We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure. It utilizes the MPI profiling interface (PMPI) and the MPI_THREAD_MULTIPLE thread compatibility level, and works with current versions of Intel MPI, Open MPI, MPICH2, MVAPICH2, Cray MPI, and IBM MPI. We show performance comparisons on a commodity InfiniBand cluster and two tier-1 systems in Germany, using low-level and application benchmarks. Issues of thread/process placement and the peculiarities of different MPI implementations are discussed in detail. We also identify the MPI libraries that already support asynchronous operations. Finally we show how our ideas can be extended to MPI-IO.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems
