CHAOS: Accurate and Realtime Detection of Aging-Oriented Failure Using Entropy
Pengfei Chen, Yong Qi, Di Hou

TL;DR
This paper introduces CHAOS, a real-time failure detection method using a novel entropy-based indicator, MMSE, which effectively predicts software aging and failures with high accuracy and low delay.
Contribution
The paper proposes MMSE, a new entropy-based aging indicator, and develops CHAOS, a failure detection approach that significantly improves accuracy and reduces detection delay.
Findings
CHAOS achieves about 5 times higher detection accuracy than previous methods.
CHAOS reduces the Ahead-Time-To-Failure (ATTF) by up to 3 orders of magnitude.
CHAOS operates efficiently enough for real-time failure detection.
Abstract
Even well-designed software systems suffer from chronic performance degradation, also named "software aging", due to internal (e.g. software bugs) and external (e.g. resource exhaustion) impairments. These chronic problems often fly under the radar of software monitoring systems before causing severe impacts (e.g. system failure). Therefore it's a challenging issue how to timely detect these problems to prevent system crash. Although a large quantity of approaches have been proposed to solve this issue, the accuracy and effectiveness of these approaches are still far from satisfactory due to the insufficiency of aging indicators adopted by them. In this paper, we present a novel entropy-based aging indicator, Multidimensional Multi-scale Entropy (MMSE). MMSE employs the complexity embedded in runtime performance metrics to indicate software aging and leverages multi-scale and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research
