Co-Design and Evaluation of a CPU-Free MPI GPU Communication Abstraction and Implementation
Patrick G. Bridges (University of New Mexico), Derek Schafer (University of New Mexico), Jack Lange (Oak Ridge National Laboratory), James B. White III (Oak Ridge National Laboratory), Anthony Skjellum (Tennessee Technological University)

TL;DR
This paper presents a novel MPI-based GPU communication API that eliminates CPU involvement, significantly reducing latency and improving scalability for GPU-based HPC and ML applications.
Contribution
It introduces a CPU-free MPI GPU communication API leveraging new network capabilities, enabling high-performance, easy-to-use GPU communication without CPU synchronization.
Findings
Up to 50% reduction in medium message latency.
28% speedup in halo-exchange benchmark on 8,192 GPUs.
Effective integration with Cabana/Kokkos framework.
Abstract
Removing the CPU from the communication fast path is essential to efficient GPU-based ML and HPC application performance. However, existing GPU communication APIs either continue to rely on the CPU for communication or rely on APIs that place significant synchronization burdens on programmers. In this paper we describe the design, implementation, and evaluation of an MPI-based GPU communication API enabling easy-to-use, high-performance, CPU-free communication. This API builds on previously proposed MPI extensions and leverages HPE Slingshot 11 network card capabilities. We demonstrate the utility and performance of the API by showing how the API naturally enables CPU-free gather/scatter halo exchange communication primitives in the Cabana/Kokkos performance portability framework, and through a performance comparison with Cray MPICH on the Frontier and Tuolumne supercomputers. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Network Packet Processing and Optimization
