TL;DR
This paper conducts an exhaustive analysis of failure-oblivious computing behaviors across 16 real-world Java failures, revealing the diversity and scope of possible failure-handling strategies to inform future research and practical applications.
Contribution
It provides the first comprehensive exploration of the failure-oblivious search space, offering insights into its size, diversity, and implications for software reliability.
Findings
Failure-oblivious behaviors are highly diverse.
The search space for failure-oblivious strategies is large.
Understanding this space can guide better failure-handling techniques.
Abstract
High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
