SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE
Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, Tao, Huang

TL;DR
SeqBalance is a novel load balancing framework for RDMA in data center networks that improves performance without causing packet reordering, compatible with existing hardware, and validated through hardware tests and simulations.
Contribution
It introduces a congestion-aware, reordering-free load balancing scheme for RDMA that works with existing hardware and improves latency metrics.
Findings
Achieves 18.7% average FCT improvement
Achieves 33.2% 99th percentile FCT improvement
Compatible with Mellanox CX-6 RNICs and Tofino switches
Abstract
Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements fine-grained load balancing for RDMA through a reasonable design and does not cause reordering problems. SeqBalance's designs are all based on existing commercial RNICs and commercial programmable switches, so they are compatible with existing data center networks. We have implemented SeqBalance in Mellanox CX-6 RNICs and Tofino switches. The results of hardware testbed experiments and large-scale simulations show that compared with existing load balancing schemes, SeqBalance improves 18.7%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Network Traffic and Congestion Control · Interconnection Networks and Systems
