# A Platform for Automating Chaos Experiments

**Authors:** Ali Basiri, Aaron Blohowiak, Lorin Hochstein, Casey Rosenthal

arXiv: 1702.05849 · 2017-02-21

## TL;DR

This paper introduces the Chaos Automation Platform, a system designed to automate failure injection experiments in Netflix's production environment to ensure system resilience against service failures.

## Contribution

It presents a novel platform for automating chaos experiments directly in production, enhancing reliability testing for large-scale distributed systems.

## Key findings

- Successfully verified system resilience through automated failure injections
- Reduced downtime caused by service failures
- Improved confidence in system robustness

## Abstract

The Netflix video streaming system is composed of many interacting services. In such a large system, failures in individual services are not uncommon. This paper describes the Chaos Automation Platform, a system for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.05849/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1702.05849/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1702.05849/full.md

---
Source: https://tomesphere.com/paper/1702.05849