From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters

Jinghan Yao; Kaushik Kandadi; Bharath Ramesh; Hari Subramoni; Dhabaleswar K. Panda

arXiv:2604.00317·cs.DC·April 2, 2026

From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters

Jinghan Yao, Kaushik Kandadi, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda

PDF

TL;DR

NIMBLE is a runtime system that dynamically balances traffic across GPU cluster links, significantly improving bandwidth utilization and scalability for skewed communication patterns.

Contribution

It introduces a novel capacity-normalized optimization and CUDA-aware pipelining to adaptively route traffic without application modifications.

Findings

01

Achieves up to 2.3x higher intra-node bandwidth.

02

Attains 3.8x higher inter-node throughput.

03

Outperforms NCCL and MPI by up to 5.2x on skewed workloads.

Abstract

Modern GPU-based high-performance computing clusters offer unprecedented communication bandwidth through heterogeneous intra-node interconnects and inter-node networks. However, despite this high aggregate bandwidth, many real-world communication patterns fail to fully utilize the available hardware. Traffic skew often leads to situations where a small subset of links becomes oversaturated while others remain underutilized, resulting in congestion, latency spikes, and poor scalability. Existing communication frameworks such as NCCL and MPI with UCX typically rely on static fastest-path routing or hashing-based multi-rail striping, which leaves significant bandwidth unused when runtime traffic deviates from expected distributions. To address these limitations, we propose NIMBLE (Node-Interconnect Multi-path Balancing with Execution-time orchestration), a runtime communication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.