On Sequence-to-Sequence Models for Automated Log Parsing

Adam Sorrenti; Andriy Miranskyy

arXiv:2602.07698·cs.SE·February 10, 2026

On Sequence-to-Sequence Models for Automated Log Parsing

Adam Sorrenti, Andriy Miranskyy

PDF

Open Access

TL;DR

This paper systematically evaluates sequence-to-sequence models for automated log parsing, comparing architectures like Transformer, Mamba, and LSTMs to determine their accuracy, efficiency, and practical implications.

Contribution

It provides a comprehensive empirical comparison of different sequence modelling architectures for log parsing, highlighting the effectiveness of Transformers and Mamba in accuracy and efficiency.

Findings

01

Transformers achieve the lowest parsing error.

02

Mamba offers competitive accuracy with lower computational cost.

03

Character-level tokenization improves performance.

Abstract

Log parsing is a critical standard operating procedure in software systems, enabling monitoring, anomaly detection, and failure diagnosis. However, automated log parsing remains challenging due to heterogeneous log formats, distribution shifts between training and deployment data, and the brittleness of rule-based approaches. This study aims to systematically evaluate how sequence modelling architecture, representation choice, sequence length, and training data availability influence automated log parsing performance and computational cost. We conduct a controlled empirical study comparing four sequence modelling architectures: Transformer, Mamba state-space, monodirectional LSTM, and bidirectional LSTM models. In total, 396 models are trained across multiple dataset configurations and evaluated using relative Levenshtein edit distance with statistical significance testing. Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Business Process Modeling and Analysis