Log Parsing Evaluation in the Era of Modern Software Systems
Stefan Petrescu, Floris den Hengst, Alexandru Uta, Jan S. Rellermeyer

TL;DR
This paper evaluates the effectiveness of existing log parsing methods on diverse real-world datasets, identifies their limitations, and introduces Logchimera, a tool for estimating log parsing performance in industrial environments.
Contribution
It critically assesses current log parsing approaches on multiple datasets and proposes Logchimera, a tool for synthetic log data generation to improve industry applicability.
Findings
Existing log parsing methods struggle with heterogeneous real-world logs.
Evaluation on diverse datasets reveals significant performance limitations.
Logchimera enables industry-specific log parsing performance estimation.
Abstract
Due to the complexity and size of modern software systems, the amount of logs generated is tremendous. Hence, it is infeasible to manually investigate these data in a reasonable time, thereby requiring automating log analysis to derive insights about the functioning of the systems. Motivated by an industry use-case, we zoom-in on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We show this by assessing the 14 most-recognized log parsing approaches in the literature using (i) nine publicly available datasets, (ii) one dataset comprised of combined publicly available data, and (iii) one dataset generated within the infrastructure of a large bank. Subsequently, toward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Business Process Modeling and Analysis · Data Quality and Management
