CD-Raft: Reducing the Latency of Distributed Consensus in Cross-Domain Sites
Yangyang Wang, Ziqian Cheng, Yucong Dong, Zichen Xu

TL;DR
CD-Raft is an optimized consensus protocol designed for cross-domain sites that significantly reduces latency in distributed AI data synchronization, enhancing system performance with formal correctness guarantees.
Contribution
It introduces CD-Raft, a novel adaptation of Raft that reduces cross-domain consensus latency through optimized RTT and leader placement, validated by formal specification and empirical testing.
Findings
Reduces average latency by 32.90%
Cuts tail latency by 49.24%
Validated correctness with TLA+ specification
Abstract
Today's massive AI computation loads push heavy data synchronization across sites, i.e., nodes in data centers. Any reduction in such consensus latency can significantly improve the overall performance of desired systems. This consensus challenge explosively peaks at cross-domain sites. In this paper, we proposed CD-Raft to address the cross-domain latency challenge, an optimized Raft protocol for strong consistency in cross-domain sites. CD-Raft can significantly reduce consensus latency by optimizing cross-domain round-trip time (RTT) for reads and writes, as well as carefully positioning the leader node. We verified the correctness of CD-Raft in a formal specification using the TLA+ specification, guaranteeing the strong consistency across sites. We have prototyped CD-Raft and evaluated it using the YCSB benchmark. Empirical results show that compared to the classic Raft, CD-Raft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Parallel Computing and Optimization Techniques
