When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation

Chaitanya Vilas Garware; and Sharif Noor Zisad

arXiv:2605.07293·cs.CR·May 11, 2026

When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation

Chaitanya Vilas Garware, and Sharif Noor Zisad

PDF

TL;DR

This paper reveals that parsing methods significantly impact the evaluation of LLM-based security log classifiers, with fuzzy parsing recovering much higher threat detection accuracy than strict regex parsing, highlighting evaluation methodology flaws.

Contribution

It identifies parsing-induced suppression as a systematic evaluation error and introduces SOC-Bench v0, a benchmark framework to standardize threat classification and improve evaluation reliability.

Findings

01

Strict regex parser reported 0% threat accuracy, fuzzy parser recovered 76%.

02

Severity accuracy remained at 58% under both parsers, indicating model stability.

03

Residual errors mainly involved reconnaissance, brute force, and credential stuffing logs.

Abstract

LLM-based SOC log classifiers are commonly evaluated using regular-expression pipelines that extract structured fields from free-form model output. We demonstrate that this practice introduces a class of silent, systematic evaluation errors, which we term parsing-induced suppression that can cause a fully functional model to appear completely non-functional. Using OpenSOC-AI, a LoRA fine-tuned TinyLlama-1.1B system for security log threat classification, as a reproducible case study, we show that a strict regex parser reported 0% threat accuracy while a corrected fuzzy parser recovered 76% threat accuracy on the same model outputs and the same evaluation set. A gap of 76 percentage points attributable entirely to evaluation methodology. Severity accuracy remained constant at 58% under both parsers, providing a built-in control that isolates field name format mismatch as the causal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.