Auditing Agent Harness Safety

Chengzhi Liu; Yichen Guo; Yepeng Liu; Yuzhe Yang; Qianqi Yan; Xuandong Zhao; Wenyue Hua; Sheng Liu; Sharon Li; Yuheng Bu; Xin Eric Wang

arXiv:2605.14271·cs.CL·May 19, 2026

Auditing Agent Harness Safety

Chengzhi Liu, Yichen Guo, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang

PDF

1 Repo

TL;DR

This paper introduces HarnessAudit, a comprehensive framework and benchmark for auditing safety violations in the full execution trajectories of LLM agent harnesses, emphasizing boundary compliance and system stability.

Contribution

It presents a novel trajectory-level auditing framework and a benchmark dataset to evaluate safety risks in multi-agent LLM harnesses, addressing gaps in output-only safety assessments.

Findings

01

Violations increase with longer trajectories.

02

Safety risks differ across domains and roles.

03

Most violations involve resource access and information transfer.

Abstract

LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eric-ai-lab/HarnessAudit
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.