Semantic-Aware Parsing for Security Logs

Julien Piet; Vivian Fang; Rishi Khare; Scott Coull; Vern Paxson; Raluca Ada Popa; David Wagner

arXiv:2506.17512·cs.CR·November 18, 2025

Semantic-Aware Parsing for Security Logs

Julien Piet, Vivian Fang, Rishi Khare, Scott Coull, Vern Paxson, Raluca Ada Popa, David Wagner

PDF

Open Access 1 Repo

TL;DR

Matryoshka uses large language models to automatically generate semantic log parsers from unlabeled security logs, enabling scalable, accurate, and automated security log analysis without manual parser creation.

Contribution

The paper introduces Matryoshka, a system that automatically infers and generates security log parsers using LLMs without labeled data, improving scalability and reducing manual effort.

Findings

01

Outperforms prior syntax parsing methods on benchmark datasets.

02

Matches human-generated parsers in security query retrieval tasks.

03

Reduces manual effort in security log data normalization.

Abstract

Security logs are foundational to threat detection and post-incident investigation, yet analysts often struggle to fully leverage them due to their heterogeneity and unstructured nature. The standard practice of manually writing parsers to normalize the data in security event management systems is time-consuming and costly due to the long tail of log formats. Meanwhile, querying raw logs without explicit parsing using large language models (LLMs) is impractical at scale. In this paper, we introduce Matryoshka, an end-to-end system leveraging LLMs to automatically generate semantically-aware structured log parsers without labeled examples or human intervention. Matryoshka achieves this by directly inferring log syntax, variable naming, and normalization to common security-specific schemas (e.g., OCSF [1]) from unlabeled log line samples, then generating deterministic parsers and mapping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

julien-piet/matryoshka
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Network Security and Intrusion Detection · Data Visualization and Analytics