An Algorithm for Tolerating Crash Failures in Distributed Systems
Vincenzo De Florio, Geert Deconinck, Rudy Lauwereins

TL;DR
This paper presents a distributed algorithm called mutual suspicion designed to ensure fault tolerance in the TIRAN backbone, enabling it to tolerate crash failures affecting nearly all components or nodes in embedded distributed systems.
Contribution
The paper introduces a novel distributed algorithm for crash failure tolerance within a fault-tolerant framework for embedded systems.
Findings
The algorithm tolerates crash failures affecting all but one component or node.
It enhances fault tolerance in embedded distributed systems.
The approach improves system reliability under fault conditions.
Abstract
In the framework of the ESPRIT project 28620 "TIRAN" (tailorable fault tolerance frameworks for embedded applications), a toolset of error detection, isolation, and recovery components is being designed to serve as a basic means for orchestrating application-level fault tolerance. These tools will be used either as stand-alone components or as the peripheral components of a distributed application, that we call 'the backbone". The backbone is to run in the background of the user application. Its objectives include (1) gathering and maintaining error detection information produced by TIRAN components like watchdog timers, trap handlers, or by external detection services working at kernel or driver level, and (2) using this information at error recovery time. In particular, those TIRAN tools related to error detection and fault masking will forward their deductions to the backbone that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Radiation Effects in Electronics
