Building on Quicksand
Pat Helland (Microsoft), David Campbell (Microsoft)

TL;DR
This paper discusses the evolution of fault tolerance in systems built from unreliable components, emphasizing asynchronous state capture, eventual consistency, and probabilistic guarantees to improve responsiveness and availability.
Contribution
It analyzes the shift towards relaxed fault tolerance models with asynchronous state capture and explores future directions for building reliable systems on increasingly unreliable components.
Findings
Asynchronous state capture leads to probabilistic guarantees.
Eventual consistency becomes essential for system reliability.
Emerging patterns enable looser consistency models for better availability.
Abstract
Reliable systems have always been built out of unreliable components. Early on, the reliable components were small such as mirrored disks or ECC (Error Correcting Codes) in core memory. These systems were designed such that failures of these small components were transparent to the application. Later, the size of the unreliable components grew larger and semantic challenges crept into the application when failures occurred. As the granularity of the unreliable component grows, the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system. There are two implications of asynchronous state capture: 1) Everything promised by the primary is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
