Revisiting Network Support for RDMA
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind, Krishnamurthy, Sylvia Ratnasamy, Scott Shenker

TL;DR
This paper challenges the necessity of Priority Flow Control for RDMA over Ethernet, proposing an improved NIC design that eliminates PFC and enhances performance with minimal resource overhead.
Contribution
It introduces IRN, a new NIC design that removes PFC requirements for RDMA, demonstrating performance gains and low implementation costs.
Findings
IRN outperforms RoCE with PFC by 6-83% in typical scenarios.
IRN eliminates PFC, reducing network issues like congestion spreading.
Implementation overhead of IRN is about 3-10% of NIC resources.
Abstract
The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, PFC brings with it a host of problems such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. Rather than seek to fix these issues, we instead ask: is PFC fundamentally required to support RDMA over Ethernet? We show that the need for PFC is an artifact of current RoCE NIC designs rather than a fundamental requirement. We propose an improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packet losses. We show that IRN (without PFC) outperforms RoCE (with PFC) by 6-83% for typical network scenarios. Thus not only does IRN eliminate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Cloud Computing and Resource Management · Network Traffic and Congestion Control
