Mutiny! How does Kubernetes fail, and what can we do about it?
Marco Barletta, Marcello Cinque, Catello Di Martino, Zbigniew T., Kalbarczyk, Ravishankar K. Iyer

TL;DR
This paper systematically analyzes Kubernetes failures, develops a fault injection framework to replicate these failures, and demonstrates that targeted fault injections can effectively simulate real-world failure patterns, highlighting the need for proactive resiliency testing.
Contribution
It introduces a fault/error injection framework for Kubernetes, enabling realistic failure simulation and comparison with real-world failures, which was lacking in prior research.
Findings
Faults in data storage can cause widespread failures and service issues.
Over half of cluster-wide failures are caused by dependency tracking errors.
Fault injections can replicate many real-world failure patterns.
Abstract
In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targeting the data store preserving the cluster state, and iii) compare results of our fault/error injection experiments with real-world failures, showing that our fault/error injections can recreate many real-world failure patterns. The paper aims to address the lack of studies on systematic analyses of Kubernetes failures to date. Our results show that even a single fault/error (e.g., a bit-flip) in the data stored can propagate, causing cluster-wide failures (3% of injections), service networking issues (4%), and service under/overprovisioning (24%). Errors in the fields tracking dependencies between object caused 51% of such cluster-wide failures. We argue that controlled fault/error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternational Relations and Foreign Policy · Economic, financial, and policy analysis · State Capitalism and Financial Governance
