On Sequence-to-Sequence Models for Automated Log Parsing
Adam Sorrenti, Andriy Miranskyy

TL;DR
This paper systematically evaluates sequence-to-sequence models for automated log parsing, comparing architectures like Transformer, Mamba, and LSTMs to determine their accuracy, efficiency, and practical implications.
Contribution
It provides a comprehensive empirical comparison of different sequence modelling architectures for log parsing, highlighting the effectiveness of Transformers and Mamba in accuracy and efficiency.
Findings
Transformers achieve the lowest parsing error.
Mamba offers competitive accuracy with lower computational cost.
Character-level tokenization improves performance.
Abstract
Log parsing is a critical standard operating procedure in software systems, enabling monitoring, anomaly detection, and failure diagnosis. However, automated log parsing remains challenging due to heterogeneous log formats, distribution shifts between training and deployment data, and the brittleness of rule-based approaches. This study aims to systematically evaluate how sequence modelling architecture, representation choice, sequence length, and training data availability influence automated log parsing performance and computational cost. We conduct a controlled empirical study comparing four sequence modelling architectures: Transformer, Mamba state-space, monodirectional LSTM, and bidirectional LSTM models. In total, 396 models are trained across multiple dataset configurations and evaluated using relative Levenshtein edit distance with statistical significance testing. Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Business Process Modeling and Analysis
