Chaos Engineering
Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke, Kosewski, Justin Reynolds, Casey Rosenthal

TL;DR
Chaos Engineering is a systematic approach to testing and improving the reliability of complex distributed systems through controlled experiments that simulate failures.
Contribution
This paper introduces Chaos Engineering, outlining its principles and demonstrating how it can be used to verify system reliability in distributed environments.
Findings
Chaos Engineering helps identify failure modes in complex systems.
Experimentation improves system robustness and reliability.
Frameworks and best practices for Chaos Engineering are discussed.
Abstract
Modern software-based services are implemented as distributed systems with complex behavior and failure modes. Many large tech organizations are using experimentation to verify the reliability of such systems. We use the term "Chaos Engineering" to refer to this approach, and discuss the underlying principles and how to use it to run experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
