Gleam: An RDMA-accelerated Multicast Protocol for Datacenter Networks
Wenxue Li (1), Junyi Zhang (2), Gaoxiong Zeng (2), Yufei Liu (2),, Zilong Wang (1), Chaoliang Zeng (1), Pengpeng Zhou (2), Qiaoling Wang (2),, Kai Chen (1) ((1) Hong Kong University of Science, Technology, (2) Huawei, Technologies Co., Ltd.)

TL;DR
Gleam is a novel RDMA-accelerated multicast protocol that enhances group communication in datacenter networks by supporting optimal forwarding and full RDMA capabilities, outperforming existing solutions.
Contribution
Gleam introduces a new multicast protocol that re-purposes RDMA RC logic with switch coordination, enabling efficient multicast and compatibility with commodity RNICs.
Findings
Gleam achieves 2.9X lower communication time in HPC benchmarks.
Gleam attains 2.7X higher data replication throughput.
Demonstrates significant performance improvements over existing multicast solutions.
Abstract
RDMA has been widely adopted for high-speed datacenter networks. However, native RDMA merely supports one-to-one reliable connection, which mismatches various applications with group communication patterns (e.g., one-to-many). While there are some multicast enhancements to address it, they all fail to simultaneously achieve optimal multicast forwarding and fully unleash the distinguished RDMA capabilities. In this paper, we present Gleam, an RDMA-accelerated multicast protocol that simultaneously supports optimal multicast forwarding, efficient utilization of the prominent RDMA capabilities, and compatibility with the commodity RNICs. At its core, Gleam re-purposes the existing RDMA RC logic with careful switch coordination as an efficient multicast transport. Gleam performs the one-to-many connection maintenance and many-to-one feedback aggregation, based on an extended multicast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software-Defined Networks and 5G · Interconnection Networks and Systems
