A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques
Max Landauer, Florian Skopik, Markus Wurzenberger

TL;DR
This paper critically reviews common log data sets used for evaluating sequence-based anomaly detection, revealing that many anomalies are not sequential and simple methods often suffice.
Contribution
It provides an in-depth analysis of six public log data sets, questioning their suitability for evaluating advanced sequence-based anomaly detection techniques.
Findings
Most anomalies are not related to sequential patterns.
Simple detection techniques can achieve high detection rates.
Advanced methods may not be necessary for these data sets.
Abstract
Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Data Quality and Management
MethodsFocus
