Monitoring Partially Synchronous Distributed Systems using SMT Solvers
Vidhya Tekken Valapil, Sorrachai Yingchareonthawornchai, Sandeep, Kulkarni, Eric Torng, Murat Demirbas

TL;DR
This paper presents a framework for monitoring partially synchronous distributed systems to detect latent bugs using SMT solvers, demonstrating feasibility through synthetic and real-world scenarios.
Contribution
It introduces a novel SMT-based monitoring approach for detecting concurrency-related bugs in distributed systems, expanding the applicability of formal verification techniques.
Findings
Framework can detect latent bugs in synthetic scenarios.
Verification time depends on communication frequency, latency, and clock skew.
Applicable to real-life distributed applications.
Abstract
In this paper, we discuss the feasibility of monitoring partially synchronous distributed systems to detect latent bugs, i.e., errors caused by concurrency and race conditions among concurrent processes. We present a monitoring framework where we model both system constraints and latent bugs as Satisfiability Modulo Theories (SMT) formulas, and we detect the presence of latent bugs using an SMT solver. We demonstrate the feasibility of our framework using both synthetic applications where latent bugs occur at any time with random probability and an application involving exclusive access to a shared resource with a subtle timing bug. We illustrate how the time required for verification is affected by parameters such as communication frequency, latency, and clock skew. Our results show that our framework can be used for real-life applications, and because our framework uses SMT solvers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Real-Time Systems Scheduling
