LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost
Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri

TL;DR
This paper introduces ChaosEater, an LLM-powered system that automates the entire chaos engineering process for Kubernetes-based systems, making resilience testing accessible and cost-effective for anyone.
Contribution
It presents a novel system that automates chaos engineering workflows using LLMs, reducing manual effort and costs in building resilient software systems.
Findings
Successfully automates CE cycles on Kubernetes systems
Reduces time and monetary costs significantly
Qualitatively validated by engineers and LLMs
Abstract
Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, uncover weaknesses, and address them before they cause failures in production. Recent CE tools automate the execution of predefined CE experiments. However, planning such experiments and improving the system based on the experimental results still remain manual. These processes are labor-intensive and require multi-domain expertise. To address these challenges and enable anyone to build resilient systems at low cost, this paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs). It predefines an agentic workflow according to a systematic CE cycle and assigns subdivided processes within the workflow to LLMs. ChaosEater targets CE for software systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
