Building State Machine Replication Using Practical Network Synchrony
Yiliang Wan, Nitin Shivaraman, Akshaye Shenoi, Xiang Liu, Tao Luo, and Jialin Li

TL;DR
This paper demonstrates that modern data center networks can be engineered to provide strong synchrony, enabling the design of a highly efficient state machine replication protocol called Chora that significantly improves throughput.
Contribution
The paper introduces a practical network design with strong synchrony assumptions and a new replication protocol, Chora, optimized for these conditions, achieving substantial performance gains.
Findings
Chora outperforms existing protocols by over 2x in throughput.
Engineered networks achieve round bounds under 2 microseconds.
Strong synchrony in data centers enables more efficient replication protocols.
Abstract
Distributed systems, such as state machine replication, are critical infrastructures for modern applications. Practical distributed protocols make minimum assumptions about the underlying network: They typically assume a partially synchronous or fully asynchronous network model. In this work, we argue that modern data center systems can be designed to provide strong synchrony properties in the common case, where servers move in synchronous lock-step rounds. We prove this hypothesis by engineering a practical design that uses a combination of kernel-bypass network, multithreaded architecture, and loosened round length, achieving a tight round bound under 2us. Leveraging our engineered networks with strong synchrony, we co-design a new replication protocol, Chora. Chora exploits the network synchrony property to efficiently pipeline multiple replication instances, while allowing all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Interconnection Networks and Systems · Distributed systems and fault tolerance
