AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents

Hanjun Luo; Shenyu Dai; Chiming Ni; Xinfeng Li; Guibin Zhang; Kun Wang; Tongliang Liu; Hanan Salam

arXiv:2506.00641·cs.AI·February 3, 2026

AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents

Hanjun Luo, Shenyu Dai, Chiming Ni, Xinfeng Li, Guibin Zhang, Kun Wang, Tongliang Liu, Hanan Salam

PDF

Open Access

TL;DR

AgentAuditor is a novel, training-free framework that enhances LLM-based evaluators' ability to reliably assess safety and security of agents, achieving human-level accuracy through memory-augmented reasoning and a new benchmark.

Contribution

It introduces a universal, memory-augmented reasoning framework for LLM evaluators and presents ASSEBench, a comprehensive benchmark for safety and security assessment.

Findings

01

AgentAuditor outperforms existing evaluators in safety/security detection.

02

Achieves human-level accuracy in safety and security evaluation.

03

Sets new state-of-the-art performance on ASSEBench.

Abstract

Despite the rapid advancement of LLM-based agents, the reliable evaluation of their safety and security remains a significant challenge. Existing rule-based or LLM-based evaluators often miss dangers in agents' step-by-step actions, overlook subtle meanings, fail to see how small issues compound, and get confused by unclear safety or security rules. To overcome this evaluation crisis, we introduce AgentAuditor, a universal, training-free, memory-augmented reasoning framework that empowers LLM evaluators to emulate human expert evaluators. AgentAuditor constructs an experiential memory by having an LLM adaptively extract structured semantic features (e.g., scenario, risk, behavior) and generate associated chain-of-thought reasoning traces for past interactions. A multi-stage, context-aware retrieval-augmented generation process then dynamically retrieves the most relevant reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience