Learning Dependencies in Distributed Cloud Applications to Identify and   Localize Anomalies

Dominik Scheinert; Alexander Acker; Lauritz Thamsen; Morgan K.; Geldenhuys; Odej Kao

arXiv:2103.05245·cs.DC·September 10, 2021

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Dominik Scheinert, Alexander Acker, Lauritz Thamsen, Morgan K., Geldenhuys, Odej Kao

PDF

1 Repo

TL;DR

This paper introduces Arvalus and D-Arvalus, neural graph transformation methods that model dependencies in distributed cloud systems to improve anomaly detection and localization, supporting faster issue mitigation.

Contribution

The paper presents a novel neural graph transformation approach that explicitly models component dependencies to enhance anomaly detection and localization in distributed cloud applications.

Findings

01

Arvalus shows good prediction performance in experiments.

02

D-Arvalus benefits from dependency information, improving localization.

03

Synthetic anomaly injection validates the approach's effectiveness.

Abstract

Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter-dependencies of system components, anomalies do not only affect their origin but propagate through the distributed system. Taking this into account, we present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies and placement as edges to improve the identification and localization of anomalies. Given a series of metric KPIs, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcd01/arvalus-experiments
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.