A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition
Prerak Srivastava, Giulio Corallo, Sergey Rybalko

TL;DR
This paper introduces a novel character-level neural log parser that uses binary-coded decimal recognition to extract highly detailed log templates efficiently, matching large language models in accuracy but with less resource consumption.
Contribution
It presents a new neural architecture for log parsing that captures fine-grained details using binary-coded decimals, improving accuracy and efficiency over existing parsers.
Findings
Matches LLM-based parsers in accuracy
Outperforms semantic parsers in efficiency
Effective on industrial and benchmark datasets
Abstract
System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
