SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin; Yang Liu; Yancheng Chen; Yongxuan Wu; Yucheng Ning; Yilong Liu; Nan Sun; Shun Zhang; Bin Chong; Chuan Zhou; Yanan Cao

arXiv:2604.13630·cs.CR·May 12, 2026

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao

PDF

TL;DR

SafeHarness is a comprehensive security architecture for LLM-based agents that integrates multiple defense layers into the agent lifecycle to mitigate attacks and reduce unsafe behaviors.

Contribution

It introduces a novel multi-layered security framework that addresses structural security gaps in LLM agent deployment, with integrated verification, privilege control, and rollback mechanisms.

Findings

01

Achieves 38% reduction in unsafe behavior rate.

02

Achieves 42% reduction in attack success rate.

03

Maintains core task utility while enhancing security.

Abstract

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.