Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines
Nick Brown, Michael Bareford, Mich\`ele Weiland

TL;DR
This paper demonstrates that replacing traditional point-to-point communication with MPI RMA in the MONC atmospheric model can reduce communication time by 5-10% on large-scale Cray systems, improving overall performance.
Contribution
It provides a detailed case study of implementing MPI RMA in a real-world atmospheric model, highlighting performance benefits and integration challenges.
Findings
RMA reduces communication time by 5-10% on 32768 cores.
Successful integration of MPI RMA requires specific optimizations.
Library support for RMA is not yet fully mature.
Abstract
Remote Memory Access (RMA), also known as single sided communications, provides a way of accessing the memory of other processes without having to issue explicit message passing style communication calls. Previous studies have concluded that MPI RMA can provide increased performance over traditional MPI Point to Point (P2P) but these are based on synthetic benchmarks. In this work, we replace the existing non-blocking P2P communication calls in the MONC atmospheric model with MPI RMA. We describe our approach in detail and discuss options taken for correctness and performance. Experiments on illustrate that by using RMA we can obtain between a 5\% and 10\% reduction in communication time at each timestep on up to 32768 cores, which over the entirety of a run (of many timesteps) results in a significant improvement in performance compared to P2P. However, RMA is not a silver bullet and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
