Resilient Cloud-based Replication with Low Latency
Michael Eischer, Tobias Distler

TL;DR
Spider is a resilient geo-replication architecture that leverages cloud infrastructure to achieve low latency and fault tolerance by organizing replicas into loosely coupled groups within the same region.
Contribution
The paper introduces Spider, a novel architecture that simplifies Byzantine fault-tolerant geo-replication by exploiting cloud availability zones for low latency and reduced complexity.
Findings
Achieves low response times by colocating replica groups near clients.
Uses reliable group-to-group channels with FIFO semantics for simplicity.
Leverages cloud fault domains to enhance resilience and performance.
Abstract
Existing approaches to tolerate Byzantine faults in geo-replicated environments require systems to execute complex agreement protocols over wide-area links and consequently are often associated with high response times. In this paper we address this problem with Spider, a resilient replication architecture for geo-distributed systems that leverages the availability characteristics of today's public-cloud infrastructures to minimize complexity and reduce latency. Spider models a system as a collection of loosely coupled replica groups whose members are hosted in different cloud-provided fault domains (i.e., availability zones) of the same geographic region. This structural organization makes it possible to achieve low response times by placing replica groups in close proximity to clients while still enabling the replicas of a group to interact over short-distance links. To handle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
