Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
Gabin Schieffer, Ruimin Shi, Stefano Markidis, Andreas Herten,, Jennifer Faj, Ivy Peng

TL;DR
This paper investigates data movement performance in AMD multi-GPU systems using Infinity Fabric, proposing a methodology to evaluate communication strategies and demonstrating that direct peer-to-peer access and RCCL outperform MPI in latency and bandwidth.
Contribution
It introduces a comprehensive test and evaluation methodology for data movement in AMD multi-GPU systems, highlighting the performance benefits of direct peer-to-peer communication and RCCL over MPI.
Findings
Peer-to-peer memory access reduces latency.
RCCL outperforms MPI in bandwidth.
Evaluation methodology aids in optimizing multi-GPU communication.
Abstract
Modern GPU systems are constantly evolving to meet the needs of computing-intensive applications in scientific and machine learning domains. However, there is typically a gap between the hardware capacity and the achievable application performance. This work aims to provide a better understanding of the Infinity Fabric interconnects on AMD GPUs and CPUs. We propose a test and evaluation methodology for characterizing the performance of data movements on multi-GPU systems, stressing different communication options on AMD MI250X GPUs, including point-to-point and collective communication, and memory allocation strategies between GPUs, as well as the host CPU. In a single-node setup with four GPUs, we show that direct peer-to-peer memory accesses between GPUs and utilization of the RCCL library outperform MPI-based solutions in terms of memory/communication latency and bandwidth. Our test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
