TL;DR
This paper critically evaluates five leading deep learning models for log-based anomaly detection across multiple datasets, revealing significant challenges and limitations in current approaches despite high reported accuracy.
Contribution
The study provides an in-depth analysis of the performance and evaluation issues of state-of-the-art deep learning models for log anomaly detection, highlighting unresolved problems.
Findings
Model performance varies significantly with data conditions
High reported accuracy may be misleading due to evaluation issues
Log anomaly detection remains an unsolved challenge
Abstract
Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly-used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
