Multivariate Log-based Anomaly Detection for Distributed Database
Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, and Zhonghai Wu

TL;DR
This paper introduces a new multivariate log-based anomaly detection method for distributed databases, supported by a comprehensive open-source dataset, revealing the limitations of single-node analysis and demonstrating improved detection accuracy.
Contribution
The paper presents the first open-source multivariate log dataset for distributed databases and proposes MultiLog, a novel anomaly detection approach tailored for such systems.
Findings
MultiLog outperforms existing methods by approximately 12%.
Single-node log analysis is insufficient for accurate anomaly detection.
The new dataset reveals unique distributed database anomalies.
Abstract
Distributed databases are fundamental infrastructures of today's large-scale software systems such as cloud systems. Detecting anomalies in distributed databases is essential for maintaining software availability. Existing approaches, predominantly developed using Loghub-a comprehensive collection of log datasets from various systems-lack datasets specifically tailored to distributed databases, which exhibit unique anomalies. Additionally, there's a notable absence of datasets encompassing multi-anomaly, multi-node logs. Consequently, models built upon these datasets, primarily designed for standalone systems, are inadequate for distributed databases, and the prevalent method of deeming an entire cluster anomalous based on irregularities in a single node leads to a high false-positive rate. This paper addresses the unique anomalies and multivariate nature of logs in distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
