A Behavioral Theory for Distributed Systems with Weak Recovery

Giovanni Fabbretti; Ivan Lanese; Jean-Bernard Stefani

arXiv:2406.12574·cs.LO·July 9, 2025

A Behavioral Theory for Distributed Systems with Weak Recovery

Giovanni Fabbretti, Ivan Lanese, Jean-Bernard Stefani

PDF

Open Access

TL;DR

This paper introduces a formal calculus for modeling distributed systems with crash failures and weak recovery, providing a behavioral theory to reason about system equivalences and fault-tolerance.

Contribution

It presents a novel calculus inspired by Erlang and distributed π-calculus, along with a behavioral theory and weak bisimilarity for reasoning about failures and recovery.

Findings

01

Developed a formal calculus for crash failures with weak recovery

02

Established a behavioral equivalence using labelled transition systems

03

Provided a compositional proof technique for system equivalence

Abstract

Distributed systems can be subject to various kinds of partial failures, therefore building fault-tolerance or failure mitigation mechanisms for distributed systems remains an important domain of research. In this paper, we present a calculus to formally model distributed systems subject to crash failures with recovery. The recovery model considered in the paper is weak, in the sense that it makes no assumption on the exact state in which a failed node resumes its execution, only its identity has to be distinguishable from past incarnations of itself. Our calculus is inspired in part by the Erlang programming language and in part by the distributed $π$ -calculus with nodes and link failures (D $π$ F) introduced by Francalanza and Hennessy. In order to reason about distributed systems with failures and recovery we develop a behavioral theory for our calculus, in the form of a contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance