Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Gabin Schieffer; Jacob Wahlgren; Ruimin Shi; Edgar A. Le\'on; Roger Pearce; Maya Gokhale; Ivy Peng

arXiv:2508.11298·cs.DC·August 19, 2025

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Gabin Schieffer, Jacob Wahlgren, Ruimin Shi, Edgar A. Le\'on, Roger Pearce, Maya Gokhale, Ivy Peng

PDF

TL;DR

This paper evaluates inter-APU communication on AMD MI300A systems using Infinity Fabric, analyzing data movement, programming interfaces, and optimizing HPC applications for improved performance.

Contribution

It provides a comprehensive analysis of inter-APU communication mechanisms and offers optimization strategies for multi-APU AMD systems with Infinity Fabric.

Findings

01

Direct GPU memory access performance insights

02

Efficiency comparison of HIP, MPI, RCCL APIs

03

Optimized HPC applications on multi-APU systems

Abstract

The ever-increasing compute performance of GPU accelerators drives up the need for efficient data movements within HPC applications to sustain performance. Proposed as a solution to alleviate CPU-GPU data movement, AMD MI300A Accelerated Processing Unit (APU) combines CPU, GPU, and high-bandwidth memory (HBM) within a single physical package. Leadership supercomputers, such as El Capitan, group four APUs within a single compute node, using Infinity Fabric Interconnect. In this work, we design specific benchmarks to evaluate direct memory access from the GPU, explicit inter-APU data movement, and collective multi-APU communication. We also compare the efficiency of HIP APIs, MPI routines, and the GPU-specialized RCCL library. Our results highlight key design choices for optimizing inter-APU communication on multi-APU AMD MI300A systems with Infinity Fabric, including programming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.