Understanding GPU Triggering APIs for MPI+X Communication
Patrick G. Bridges, Anthony Skjellum, Evan D. Suggs, Derek, Schafer, Purushotham V. Bangalore

TL;DR
This paper analyzes various GPU-triggered MPI communication APIs, comparing their design space, semantics, and performance implications to promote community convergence and guide future standardization efforts.
Contribution
It provides a comprehensive taxonomy and analysis of existing GPU-triggered MPI abstractions, highlighting their goals, differences, and potential for standardization.
Findings
Identifies common goals and differences among GPU-triggered MPI APIs.
Highlights the potential performance benefits of kernel and stream-triggered communication.
Discusses semantic and functional gaps in current abstractions.
Abstract
GPU-enhanced architectures are now dominant in HPC systems, but message-passing communication involving GPUs with MPI has proven to be both complex and expensive, motivating new approaches that lower such costs. We compare and contrast stream/graph- and kernel-triggered MPI communication abstractions, whose principal purpose is to enhance the performance of communication when GPU kernels create or consume data for transfer through MPI operations. Researchers and practitioners have proposed multiple potential APIs for stream and/or kernel triggering that span various GPU architectures and approaches, including MPI-4 partitioned point-to-point communication, stream communicators, and explicit MPI stream/queue objects. Designs breaking backward compatibility with MPI are duly noted. Some of these strengthen or weaken the semantics of MPI operations. A key contribution of this paper is to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Embedded Systems Design Techniques
