How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters
Monika Steidl, Benedikt Dornauer, Michael Felderer, Rudolf Ramler,, Mircea-Cristian Racasan, Marko Gattringer

TL;DR
This paper explores how industries detect runtime anomalies in complex systems, highlighting current approaches, key monitoring parameters, and the preference for rule-based methods over AI in practical settings.
Contribution
It provides insights into industry practices, identifies essential runtime monitoring parameters, and explains the preference for rule-based approaches over AI-based methods.
Findings
Diversity in interpretations of runtime anomalies
Industry favors rule-based detection over AI approaches
Key monitoring parameters include logs, traces, and metrics
Abstract
Deviations from expected behavior during runtime, known as anomalies, have become more common due to the systems' complexity, especially for microservices. Consequently, analyzing runtime monitoring data, such as logs, traces for microservices, and metrics, is challenging due to the large volume of data collected. Developing effective rules or AI algorithms requires a deep understanding of this data to reliably detect unforeseen anomalies. This paper seeks to comprehend anomalies and current anomaly detection approaches across diverse industrial sectors. Additionally, it aims to pinpoint the parameters necessary for identifying anomalies via runtime monitoring data. Therefore, we conducted semi-structured interviews with fifteen industry participants who rely on anomaly detection during runtime. Additionally, to supplement information from the interviews, we performed a literature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
