TL;DR
This paper introduces Arvalus and D-Arvalus, neural graph transformation methods that model dependencies in distributed cloud systems to improve anomaly detection and localization, supporting faster issue mitigation.
Contribution
The paper presents a novel neural graph transformation approach that explicitly models component dependencies to enhance anomaly detection and localization in distributed cloud applications.
Findings
Arvalus shows good prediction performance in experiments.
D-Arvalus benefits from dependency information, improving localization.
Synthetic anomaly injection validates the approach's effectiveness.
Abstract
Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter-dependencies of system components, anomalies do not only affect their origin but propagate through the distributed system. Taking this into account, we present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies and placement as edges to improve the identification and localization of anomalies. Given a series of metric KPIs, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
