SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao

TL;DR
SafeHarness is a comprehensive security architecture for LLM-based agents that integrates multiple defense layers into the agent lifecycle to mitigate attacks and reduce unsafe behaviors.
Contribution
It introduces a novel multi-layered security framework that addresses structural security gaps in LLM agent deployment, with integrated verification, privilege control, and rollback mechanisms.
Findings
Achieves 38% reduction in unsafe behavior rate.
Achieves 42% reduction in attack success rate.
Maintains core task utility while enhancing security.
Abstract
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
