Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
Hui Zhou, Kenneth Raffenetti, Yanfei Guo, Michael Wilkins, and Rajeev Thakur

TL;DR
This paper discusses implementing true MPI Sessions in MPICH, including architectural refactoring, and evaluates their scalability benefits over traditional MPI models.
Contribution
It introduces a major internal refactoring in MPICH to fully support MPI Sessions, enabling scalable, hierarchical communication models.
Findings
True Sessions provide significant scalability improvements.
Architectural changes enable decoupling from MPI_COMM_WORLD.
Hierarchical designs improve performance on exascale systems.
Abstract
Sessions is one of the major features introduced in the MPI-4 standard. It offers an alternative to the traditional world communicator model by allowing applications to construct communicators from process sets, thereby eliminating the dependency on MPI_COMM_WORLD. The Sessions model was proposed as a more scalable solution for exascale systems, where MPI_COMM_WORLD was viewed as a potential scalability bottleneck. However, supporting Sessions is a significant challenge for established codebases like MPICH due to the deep integration of the world model in traditional MPI implementations. Although MPICH added support for the MPI-4 standard upon its release, it still internally relied on a global world communicator. This approach enabled applications written using the Sessions model to function, but it did not fulfill the full design intent of Sessions, which meant to decouple MPI from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
