OS-level Failure Injection with SystemTap
Camille Coti, Nicolas Greneche

TL;DR
This paper introduces a kernel-level failure injection system using SystemTap for Linux, enabling realistic and controlled failure scenarios in distributed systems to test robustness.
Contribution
It presents a novel failure injection method at the kernel level using SystemTap, allowing precise and flexible simulation of failures in distributed systems.
Findings
Enables deterministic failure injection scenarios
Supports probabilistic failure scenarios
Operates at kernel level for realistic failure simulation
Abstract
Failure injection in distributed systems has been an important issue to experiment with robust, resilient distributed systems. In order to reproduce real-life conditions, parts of the application must be killed without letting the operating system close the existing network communications in a "clean" way. When a process is simply killed, the OS closes them. SystemTap is a an infrastructure that probes the Linux kernel's internal calls. If processes are killed at kernel-level, they can be destroyed without letting the OS do anything else. In this paper, we present a kernel-level failure injection system based on SystemTap. We present how it can be used to implement deterministic and probabilistic failure scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Distributed systems and fault tolerance · Advanced Data Storage Technologies
