Survey on Models and Techniques for Root-Cause Analysis
Marc Sol\'e, Victor Munt\'es-Mulero, Annie Ibrahim Rana, Giovani, Estrada

TL;DR
This survey reviews root-cause analysis models and techniques, emphasizing their scalability and performance in handling large data volumes in IoT and cloud systems, and offers guidance for selecting suitable methods.
Contribution
It uniquely focuses on the scalability and performance aspects of root-cause analysis techniques in large-scale, real-time systems, which previous surveys have not emphasized.
Findings
Highlights the importance of scalable root-cause analysis methods.
Provides a comparative overview of techniques based on performance and applicability.
Guides practitioners in selecting appropriate root-cause analysis strategies.
Abstract
Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Reliability and Analysis Research · Cloud Computing and Resource Management
