Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (Technical Report)
Rados{\l}aw Jan Rowicki, Adrian Francalanza, Alceste Scalas

TL;DR
This paper introduces a formal, correct, and implementable approach for distributed black-box deadlock detection in microservices, using message observation and probe exchange, validated through a tool for Erlang/OTP systems.
Contribution
It formalizes, proves correctness, and implements the first distributed black-box deadlock monitors for message-passing systems, validated with a tool and mechanized in Coq.
Findings
Monitor is sound and complete in deadlock detection.
Implementation in DDMon demonstrates practical applicability.
Formal proof of correctness ensures reliability of the detection method.
Abstract
Many software applications rely on concurrent and distributed (micro)services that interact via message-passing and various forms of remote procedure calls (RPC). As these systems organically evolve and grow in scale and complexity, the risk of introducing deadlocks increases and their impact may worsen: even if only a few services deadlock, many other services may block while awaiting responses from the deadlocked ones. As a result, the "core" of the deadlock can be obfuscated by its consequences on the rest of the system, and diagnosing and fixing the problem can be challenging. In this work we tackle the challenge by proposing distributed black-box monitors that are deployed alongside each service and detect deadlocks by only observing the incoming and outgoing messages, and exchanging probes with other monitors. We present a formal model that captures popular RPC-based application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
