EventADL: Open-Box Anomaly Detection and Localization Framework for Events in Cloud-Based Service Systems
Luan Pham, Victor Nicolet, Joey Dodds, Hui Guan, Daniel Kroening

TL;DR
EventADL is an innovative open-box framework for anomaly detection and root cause localization in cloud systems, utilizing event data to improve reliability and interpretability.
Contribution
It introduces a novel event-based ADL framework with patterns and intervention graphs, filling a gap left by metric and log data-focused methods.
Findings
Achieves at least 90% F1-score in anomaly detection
Attains 100% top-3 accuracy in root cause localization
Outperforms existing methods on real cloud service data
Abstract
Anomaly detection and localization (ADL) is critical for maintaining reliability and availability in cloud systems. Recent ADL developments focus on metric and log data, leaving event data unexplored. To address this gap, we propose EventADL, the first open-box event-based ADL framework for cloud-based service systems. To motivate the design of our framework, we conduct a systematic analysis on 520 real-world incidents, and provide insights into how anomalies and their root causes manifest through event data. EventADL has three phases: offline training, online anomaly detection, and root cause localization. During the training phase, EventADL first learns Event Semantic Patterns (ESPs), which capture normal interactions between system entities using historical event data, and then learns Event Frequency Patterns (EFPs), which capture the normal frequency of known ESPs. In the online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
