TL;DR
This paper introduces ChaosMachine, a chaos engineering system for Java that actively tests and analyzes exception-handling in production, revealing system weaknesses and strengths at the try-catch level.
Contribution
It presents a novel Java-based chaos engineering tool that provides detailed analysis of exception handling in production environments, a previously underexplored area.
Findings
Revealed weaknesses in exception handling in large Java applications
Demonstrated effectiveness of ChaosMachine on real-world systems
Provided actionable insights into resilience code strengths and vulnerabilities
Abstract
Software systems contain resilience code to handle those failures and unexpected events happening in production. It is essential for developers to understand and assess the resilience of their systems. Chaos engineering is a technology that aims at assessing resilience and uncovering weaknesses by actively injecting perturbations in production. In this paper, we propose a novel design and implementation of a chaos engineering system in Java called ChaosMachine. It provides a unique and actionable analysis on exception-handling capabilities in production, at the level of try-catch blocks. To evaluate our approach, we have deployed ChaosMachine on top of 3 large-scale and well-known Java applications totaling 630k lines of code. Our results show that ChaosMachine reveals both strengths and weaknesses of the resilience code of a software system at the level of exception handling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
