GPU-Initiated Networking for NCCL
Khaled Hamidouche (1), John Bachan (1), Pak Markthub (1), Peter-Jan Gootzen (1), Elena Agostini (1), Sylvain Jeaugey (1), Aamir Shafi (1), Georgios Theodorakis (1), Manjunath Gorentla Venkata (1) ((1) NVIDIA Corporation)

TL;DR
This paper introduces GPU-Initiated Networking (GIN) in NCCL 2.28, enabling direct GPU-to-network communication to reduce latency and improve efficiency for AI workloads like Mixture-of-Experts.
Contribution
It presents the GIN architecture and APIs, allowing device-initiated communication that integrates seamlessly with NCCL and supports various hardware backends.
Findings
GIN reduces communication latency in MoE workloads.
Integration with DeepEP demonstrates practical benefits.
Benchmarking confirms low-latency performance with GIN.
Abstract
Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control. Traditional GPU communication follows a host-initiated model, where the CPU orchestrates all communication operations - a characteristic of the CUDA runtime. Although robust for collective operations, applications requiring tight integration of computation and communication can benefit from device-initiated communication that eliminates CPU coordination overhead. NCCL 2.28 introduces the Device API with three operation modes: Load/Store Accessible (LSA) for NVLink/PCIe, Multimem for NVLink SHARP, and GPU-Initiated Networking (GIN) for network RDMA. This paper presents the GIN architecture, design, semantics, and highlights its impact on MoE communication. GIN builds on a three-layer architecture: i) NCCL Core host-side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
