ChaosEater: Fully Automating Chaos Engineering with Large Language Models
Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri

TL;DR
ChaosEater automates chaos engineering for Kubernetes systems using large language models, reducing manual effort and costs by handling the entire CE cycle from requirement definition to testing.
Contribution
It introduces a fully automated system leveraging LLMs to perform all chaos engineering tasks in Kubernetes environments, a novel approach in CE automation.
Findings
Successfully completes CE cycles with low time and monetary costs.
Validated by human engineers and LLMs for quality and reliability.
Effective for both small and large Kubernetes systems.
Abstract
Chaos Engineering (CE) is an engineering technique aimed at improving the resiliency of distributed systems. It involves artificially injecting specific failures into a distributed system and observing its behavior in response. Based on the observation, the system can be proactively improved to handle those failures. Recent CE tools implement the automated execution of predefined CE experiments. However, defining these experiments and improving the system based on the experimental results still remain manual. To reduce the costs of the manual operations, we propose ChaosEater, a system for automating the entire CE operations with Large Language Models (LLMs). It predefines the agentic workflow according to a systematic CE cycle and assigns subdivided operations within the workflow to LLMs. ChaosEater targets CE for Kubernetes systems, which are managed through code (i.e., Infrastructure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Topic Modeling
