Parser-Free Querying of Security Logs
Evan Luo, Julien Piet, David Wagner

TL;DR
Sieve is a system that enables security analysts to generate executable queries directly from natural language questions on raw logs, reducing the need for manual parsing and scripting.
Contribution
It introduces a method that uses a large language model with lightweight log-format context to produce accurate, executable queries from natural language, improving efficiency and accuracy.
Findings
Over 3x reduction in error rate on complex queries
Largest gains on multi-line correlation tasks
Effective bridging of structured querying and raw log analysis
Abstract
Security analysts routinely query system logs to detect threats and investigate incidents, but each log source uses its own semi-structured format: logs are cheap to produce, but expensive to use. The standard approach, building per-source parsers to normalize logs into structured schemas, is powerful but requires continuous engineering effort for each new format. Querying raw logs directly with tools like grep avoids this cost, but requires analysts to know each source's message variants and cannot express the multi-line temporal queries that security investigations demand. We present Sieve, a system that generates executable query code from natural-language security questions by grounding a large language model with lightweight, automatically extracted log-format context, requiring only one LLM call per query followed by deterministic execution. Evaluating 133 security queries across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
