On the Effectiveness of Log Representation for Log-based Anomaly Detection
Xingfang Wu, Heng Li, Foutse Khomh

TL;DR
This paper systematically compares six log representation techniques across multiple datasets and machine learning models to determine their impact on log-based anomaly detection performance, providing practical guidelines for selecting optimal methods.
Contribution
It offers a comprehensive evaluation of log representation techniques and their effects on anomaly detection, filling a gap in understanding their relative effectiveness.
Findings
Certain log representation techniques outperform others in specific datasets.
Log parsing and feature aggregation significantly influence detection accuracy.
Guidelines are provided for choosing suitable log representations in practice.
Abstract
Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Anomaly Detection Techniques and Applications
