Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive
Gabin Schieffer, Jacob Wahlgren, Ruimin Shi, Edgar A. Le\'on, Roger Pearce, Maya Gokhale, Ivy Peng

TL;DR
This paper evaluates inter-APU communication on AMD MI300A systems using Infinity Fabric, analyzing data movement, programming interfaces, and optimizing HPC applications for improved performance.
Contribution
It provides a comprehensive analysis of inter-APU communication mechanisms and offers optimization strategies for multi-APU AMD systems with Infinity Fabric.
Findings
Direct GPU memory access performance insights
Efficiency comparison of HIP, MPI, RCCL APIs
Optimized HPC applications on multi-APU systems
Abstract
The ever-increasing compute performance of GPU accelerators drives up the need for efficient data movements within HPC applications to sustain performance. Proposed as a solution to alleviate CPU-GPU data movement, AMD MI300A Accelerated Processing Unit (APU) combines CPU, GPU, and high-bandwidth memory (HBM) within a single physical package. Leadership supercomputers, such as El Capitan, group four APUs within a single compute node, using Infinity Fabric Interconnect. In this work, we design specific benchmarks to evaluate direct memory access from the GPU, explicit inter-APU data movement, and collective multi-APU communication. We also compare the efficiency of HIP APIs, MPI routines, and the GPU-specialized RCCL library. Our results highlight key design choices for optimizing inter-APU communication on multi-APU AMD MI300A systems with Infinity Fabric, including programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
