DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks
Amir Shetaia, Sean Kauffman

TL;DR
DeepParse is a hybrid log parsing framework that leverages LLMs to automatically identify variable patterns and applies deterministic algorithms for scalable, accurate, and cost-effective log structuring.
Contribution
It introduces a novel hybrid approach combining LLMs and deterministic algorithms to improve log parsing accuracy and efficiency without brittle rules or high inference costs.
Findings
Achieves 97.6% average parsing accuracy across 16 datasets.
Reduces false alarms in anomaly detection by over 30%.
Decreases inference latency by 36% compared to heuristic baselines.
Abstract
Modern distributed systems produce massive, heterogeneous logs essential for reliability, security, and anomaly detection. Converting these free-form messages into structured templates (log parsing) is challenging due to evolving formats and limited labeled data. Machine-learning-based parsers like Drain are fast but accuracy often degrades on complex variables, while Large Language Models (LLMs) offer better generalization but incur prohibitive inference costs. This paper presents DeepParse, a hybrid framework that automatically mines reusable variable patterns from small log samples using an LLM, then applies them deterministically through the Drain algorithm. By separating the reasoning phase from execution, DeepParse enables accurate, scalable, and cost-efficient log structuring without relying on brittle handcrafted rules or per-line neural inference. Across 16 benchmark datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
