From RDMA to RDCA: Toward High-Speed Last Mile of Data Center Networks Using Remote Direct Cache Access
Qiang Li, Qiao Xiang, Derui Liu, Yuxin Wang, Haonan Qiu, Xiaoliang, Wang, Jie Zhang, Ridi Wen, Haohao Song, Gexiao Tian, Chenyang Huang, Lulu, Chen, Shaozong Liu, Yaohui Wu, Zhiwu Wu, Zicheng Luo, Yuchao Shao, Chao Han,, Zhongjie Wu, Jianbo Dong, Zheng Cao, Jinbo Wu, Jiwu Shu

TL;DR
This paper introduces Lamda, a receiver cache system that bypasses host memory to improve network throughput and latency in high-speed RDMA networks, especially for storage and HPC applications.
Contribution
It proposes a novel cache-based processing system, Lamda, that reduces memory bandwidth contention and enhances network performance in data center environments.
Findings
Lamda improves network throughput by 4.7% with zero memory bandwidth use.
It increases throughput by up to 17% and 45% for different block sizes.
Latency for HPC applications is reduced by 35.1%.
Abstract
In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in high-speed RDMA networks.We identify its root cause as the high contention of memory bandwidth between application processes and network processes. This contention leads to frequent packet drops at the NIC of receiving hosts, which triggers the congestion control mechanism of the network and eventually results in network performance degradation. To tackle this problem, we make a key observation that given the distributed storage service, the vast majority of data it receives from the network will be eventually written to high-speed storage media (e.g., SSD) by CPU. As such, we propose to bypass host memory when processing received data to completely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Caching and Content Delivery · Advanced Data Storage Technologies
