The Low Latency Fault Tolerance System
Wenbing Zhao, P. M. Melliar-Smith, L. E. Moser

TL;DR
The LLFT system offers low latency fault tolerance for distributed applications through leader-follower replication, ensuring strong consistency and fast recovery with application-transparent mechanisms.
Contribution
It introduces a comprehensive low latency fault tolerance system combining messaging, membership, and determinization protocols for distributed applications.
Findings
Provides reliable, totally ordered message delivery
Enables fast reconfiguration and recovery after faults
Ensures strong replica consistency during normal and fault conditions
Abstract
The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique. The LLFT system provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. The LLFT system comprises a Low Latency Messaging Protocol, a Leader-Determined Membership Protocol, and a Virtual Determinizer Framework. The Low Latency Messaging Protocol provides reliable, totally ordered message delivery by employing a direct group-to-group multicast, where the message ordering is determined by the primary replica in the group. The Leader-Determined Membership Protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership of the group is determined by the primary replica. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Age of Information Optimization · Distributed and Parallel Computing Systems
