DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale
Bruno Wassermann, David Ohana, Ronen Schaffer, Robert Shahla, Elliot, K. Kolodner, Eran Raichstein, Michal Malka

TL;DR
DeCorus is a scalable hierarchical multivariate anomaly detection method designed for large-scale telemetry data, improving incident detection accuracy in cloud environments by leveraging domain knowledge and statistical techniques.
Contribution
The paper introduces DeCorus, a linear-complexity, hierarchical anomaly detection approach that enhances detection relevance using domain knowledge and extends statistical methods for noisy signals.
Findings
DeCorus outperforms five alternative anomaly detectors on real-world data.
DeCorus effectively detects incidents in large-scale syslog data.
All tested detectors face challenges with the dataset complexity.
Abstract
Multivariate anomaly detection can be used to identify outages within large volumes of telemetry data for computing systems. However, developing an efficient anomaly detector that can provide users with relevant information is a challenging problem. We introduce our approach to hierarchical multivariate anomaly detection called DeCorus, a statistical multivariate anomaly detector which achieves linear complexity. It extends standard statistical techniques to improve their ability to find relevant anomalies within noisy signals and makes use of types of domain knowledge that system operators commonly possess to compute system-level anomaly scores. We describe the implementation of DeCorus an online log anomaly detection tool for network device syslog messages deployed at a cloud service provider. We use real-world data sets that consist of billion network device syslog messages and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Anomaly Detection Techniques and Applications · Software System Performance and Reliability
Methodstravel james
