Real Life Is Uncertain. Consensus Should Be Too!
Reginald Frank, Soujanya Ponnapalli, Octavio Lomeli, Neil Giridharan, Marcos K Aguilera, and Natacha Crooks

TL;DR
This paper advocates for adopting a probabilistic failure model in distributed consensus protocols, moving beyond traditional fixed-threshold models to better reflect real-world fault complexities, enabling more reliable and efficient systems.
Contribution
It introduces a probabilistic failure model for consensus protocols, allowing for optimization based on individual machine failure behaviors rather than fixed failure thresholds.
Findings
Probabilistic models better capture real-world fault behaviors.
Potential to improve system reliability and efficiency.
Enables bypassing traditional quorum intersection bottlenecks.
Abstract
Modern distributed systems rely on consensus protocols to build a fault-tolerant-core upon which they can build applications. Consensus protocols are correct under a specific failure model, where up to machines can fail. We argue that this -threshold failure model oversimplifies the real world and limits potential opportunities to optimize for cost or performance. We argue instead for a probabilistic failure model that captures the complex and nuanced nature of faults observed in practice. Probabilistic consensus protocols can explicitly leverage individual machine \textit{failure curves} and explore side-stepping traditional bottlenecks such as majority quorum intersection, enabling systems that are more reliable, efficient, cost-effective, and sustainable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Cloud Computing and Resource Management
