Understanding Inconsistency in Azure Cosmos DB with TLA+
A. Finn Hackett, Joshua Rowe, Markus Alexander Kuppe

TL;DR
This paper presents a formal specification of Azure Cosmos DB's user-visible behaviors using TLA+, clarifying semantics, identifying issues, and improving reliability for dependent Microsoft services.
Contribution
It introduces a simplified, comprehensive formal specification of Cosmos DB's semantics that uncovers previously misunderstood behaviors and informs improvements in documentation and system reliability.
Findings
Identified inconsistencies in Cosmos DB's documentation
Raised issues leading to documentation updates
Provided a solution to a major outage
Abstract
Beyond implementation correctness of a distributed system, it is equally important to understand exactly what users should expect to see from that system. Even if the system itself works as designed, insufficient understanding of its user-visible semantics can cause bugs in its dependencies. By focusing a formal specification effort on precisely defining the expected user-facing behaviors of the Azure Cosmos DB service at Microsoft, we were able to write a formal specification of the database that was significantly smaller and conceptually simpler than any other specification of Cosmos DB, while representing a wider range of valid user-observable behaviors than existing more detailed specifications. Many of the additional behaviors we documented were previously poorly understood outside of the Cosmos DB development team, even informally, leading to data consistency errors in Microsoft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Distributed and Parallel Computing Systems
