A Short Counterexample Property for Safety and Liveness Verification of Fault-tolerant Distributed Algorithms
Igor Konnov, Marijana Lazic, Helmut Veith, Josef Widder

TL;DR
This paper presents an automated model checking method for fault-tolerant distributed algorithms that uses a short counterexample property, enabling verification of safety and liveness across parameterized systems.
Contribution
It introduces a novel short counterexample property for model checking, applicable to parameterized distributed algorithms with threshold guards, extending the capabilities of existing tools.
Findings
Verified 10 prominent fault-tolerant algorithms
Most algorithms were previously out of reach for existing techniques
The method efficiently shortens counterexamples independent of system parameters
Abstract
Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards. We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
