Collectives in hybrid MPI+MPI code: design, practice and performance
Huan Zhou, Jose Gracia, Naweiluo Zhou, Ralf Schneider

TL;DR
This paper introduces a new design method for hybrid MPI+MPI collective communication that reduces on-node memory overhead and improves performance, validated through benchmarks and computational kernels.
Contribution
It proposes a novel design approach for MPI+MPI collective operations, including wrapper primitives and best practices, enhancing efficiency over traditional methods.
Findings
Micro-benchmarks show comparable or better performance than pure MPI.
Validated effectiveness in three computational kernels.
Reduces on-node communication overheads.
Abstract
The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing extensive practices on hybrid Message Passing Interface (MPI) plus Open Multi-Processing (OpenMP) programming account for its popularity. Nevertheless, strong programming efforts are required to gain performance benefits from the MPI+OpenMP code. An emerging hybrid method that combines MPI and the MPI shared memory model (MPI+MPI) is promising. However, writing an efficient hybrid MPI+MPI program -- especially when the collective communication operations are involved -- is not to be taken for granted. In this paper, we propose a new design method to implement hybrid MPI+MPI context-based collective communication operations. Our method avoids on-node memory replications (on-node communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
