Maximizing Error Injection Realism for Chaos Engineering with System Calls
Long Zhang, Brice Morin, Benoit Baudry, and Martin Monperrus

TL;DR
This paper introduces Phoebe, a fault injection framework that creates realistic system call errors based on production data, enabling systematic reliability testing of applications with minimal overhead.
Contribution
Phoebe is the first framework to generate realistic system call error models grounded in production errors for systematic reliability assessment.
Findings
Phoebe accurately mimics real production errors.
It detects critical reliability weaknesses.
It operates with low runtime overhead.
Abstract
In this paper, we present a novel fault injection framework for system call invocation errors, called Phoebe. Phoebe is unique as follows. First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they mimic errors that naturally happen in production. Third, Phoebe is able to automatically conduct experiments to systematically assess the reliability of applications with respect to system call invocation errors in production. We evaluate the effectiveness and runtime overhead of Phoebe on two real-world applications in a production environment. The results show that Phoebe successfully generates realistic error models and is able to detect important reliability weaknesses with respect to system call invocation errors. To our knowledge, this novel concept of "realistic error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
