When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Lidor Erez; Omer Hofman; Tamir Nizri; Roman Vainshtein

arXiv:2603.14633·cs.CR·March 17, 2026

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Lidor Erez, Omer Hofman, Tamir Nizri, Roman Vainshtein

PDF

Open Access

TL;DR

This paper reveals that evaluator bias significantly affects the reliability of LLM vulnerability assessments and introduces a framework to improve evaluation consistency and accuracy.

Contribution

It presents a two-phase, reliability-aware evaluation framework that quantifies evaluator disagreement and employs verification to enhance assessment reliability.

Findings

01

22 of 25 attack categories show evaluator instability

02

Evaluator accuracy improved from 72% to 89%

03

Vulnerability scores can vary by up to 33% depending on evaluator

Abstract

Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the validity of these measurements hinges on an often-overlooked component: the evaluator who determines whether an attack has succeeded. In this study, we demonstrate that commonly used open-source scanners exhibit measurement instability that depends on the evaluator component. Consequently, changing the evaluator while keeping the attacks and model outputs constant can significantly alter the reported ASR. To tackle this problem, we present a two-phase, reliability-aware evaluation framework. In the first phase, we quantify evaluator disagreement to identify attack categories where ASR reliability cannot be assumed. In the second phase, we propose a verification-based evaluation method where evaluators are validated by an independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Web Application Security Vulnerabilities · Software Engineering Research