GPU peer-to-peer techniques applied to a cluster interconnect
Roberto Ammendola, Massimo Bernaschi, Andrea Biagioni, Mauro Bisson,, Massimiliano Fatica, Ottorino Frezza, Francesca Lo Cicero, Alessandro, Lonardo, Enrico Mastrostefano, Pier Stanislao Paolucci, Davide Rossetti,, Francesco Simula, Laura Tosoratto, Piero Vicini

TL;DR
This paper explores enabling direct GPU-to-GPU data exchange over a cluster interconnect by modifying hardware and software, demonstrating performance gains in benchmarks and applications.
Contribution
It details architectural modifications and software integration for peer-to-peer GPU communication on FPGA-based clusters, addressing hardware and API challenges.
Findings
Performance improvements observed in benchmarks
Enhanced data transfer efficiency in GPU applications
Identified limitations and potential of the peer-to-peer technique
Abstract
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
