Measuring Masking Fault-Tolerance
Pablo F. Castro, Pedro R. D'Argenio, Ramiro Demasi, and Luciano, Putruele

TL;DR
This paper introduces a quantitative measure of masking fault-tolerance in systems, capturing how well a system can hide faults without observable effects, and provides tools for automatic analysis.
Contribution
It defines a novel masking fault-tolerance distance using simulation relations and game theory, and implements a prototype tool for automatic measurement.
Findings
The masking distance is a directed pseudo metric.
The approach is validated on multiple case studies.
The tool effectively measures masking fault-tolerance levels.
Abstract
In this paper we introduce a notion of fault-tolerance distance between labeled transition systems. Intuitively, this notion of distance measures the degree of fault-tolerance exhibited by a candidate system. In practice, there are different kinds of fault-tolerance, here we restrict ourselves to the analysis of masking fault-tolerance because it is often a highly desirable goal for critical systems. Roughly speaking, a system is masking fault-tolerant when it is able to completely mask the faults, not allowing these faults to have any observable consequences for the users. We capture masking fault-tolerance via a simulation relation, which is accompanied by a corresponding game characterization. We enrich the resulting games with quantitative objectives to define the notion of masking fault-tolerance distance. Furthermore, we investigate the basic properties of this notion of masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software Reliability and Analysis Research · Formal Methods in Verification
