uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]
Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, Igor Zablotchi

TL;DR
uBFT is a novel Byzantine Fault Tolerant system achieving microsecond-scale latency in data centers by leveraging disaggregated memory and a new abstraction, enabling fast, reliable replication of key applications.
Contribution
It introduces uBFT, the first SMR system with microsecond latency using only 2f+1 replicas and disaggregated memory, with a novel Consistent Tail Broadcast abstraction.
Findings
End-to-end latency as low as 10 microseconds
50x faster than previous state-of-the-art BFT SMR systems
Successfully replicated key-value stores and a financial engine
Abstract
We propose uBFT, the first State-Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only replicas to tolerate Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory (both local and disaggregated). uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50 faster than MinBFT , a state of the art BFT SMR based on Intel's SGX. We use uBFT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Cloud Data Security Solutions · Software System Performance and Reliability
