TL;DR
This paper models, implements, and evaluates partitioned communication in MPI, demonstrating performance improvements in multithreaded and large-message scenarios through new techniques and optimizations.
Contribution
It provides a performance model, implementation enhancements in MPICH, and an extensive evaluation of partitioned communication's benefits in MPI applications.
Findings
Partitioned communication reduces contention and overhead in multithreaded environments.
Exploiting communication delays improves performance with large messages.
Proposed solutions mitigate penalties for small partition sizes.
Abstract
Partitioned communication was introduced in MPI 4.0 as a user-friendly interface to support pipelined communication patterns, particularly common in the context of MPI+threads. It provides the user with the ability to divide a global buffer into smaller independent chunks, called partitions, which can then be communicated independently. In this work we first model the performance gain that can be expected when using partitioned communication. Next, we describe the improvements we made to \mpich{} to enable those gains and provide a high-quality implementation of MPI partitioned communication. We then evaluate partitioned communication in various common use cases and assess the performance in comparison with other MPI point-to-point and one-sided approaches. Specifically, we first investigate two scenarios commonly encountered for small partition sizes in a multithreaded environment:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
