MatchRDMA: A Segmented and Rate-Matched Long-Haul RDMA Scheme for Geo-distributed LLM Training over OTN
Jun Dai, Xiaorun Wang, Xingde Li, Zheng Yang, Kexiong Fang, Zhiqun Gu, Hongxiang Wang, Yuefeng Ji, and Jiawei Zhang

TL;DR
MatchRDMA is a novel long-haul RDMA scheme for geo-distributed LLM training that significantly enhances throughput and reduces buffer occupancy by coordinating OTN rates.
Contribution
It introduces a proactive, segmented, and rate-matched RDMA scheme that improves inter-DC throughput and buffer management over OTN.
Findings
Up to 20x increase in inter-DC throughput.
Up to 62.7% reduction in buffer occupancy.
Effective coordination of source and destination OTN rates.
Abstract
We propose MatchRDMA, a proactive, segmented, and rate-matched long-haul RDMA scheme for geo-distributed LLM training over OTN. By coordinating source and destination OTN rates, it improves inter-DC throughput by up to 20x compared with conventional RDMA, and reduces destination-OTN buffer occupancy by up to 62.7%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
