Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects
Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis,, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh, Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler

TL;DR
This paper characterizes GPU-to-GPU communication in supercomputers, revealing untapped bandwidth and optimization opportunities across different architectures to improve performance and efficiency.
Contribution
It provides a comprehensive performance analysis of intra-node and inter-node GPU interconnects on three supercomputers, offering practical insights for optimization.
Findings
Untapped bandwidth exists in current GPU interconnects.
Performance bottlenecks vary across architectures.
Opportunities for software and network optimization are significant.
Abstract
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
