Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice
Yuxu Ge

TL;DR
This paper introduces a four-layer governance framework for autonomous agents using large language models, addressing execution vulnerabilities with a new benchmark and demonstrating high effectiveness in threat interception.
Contribution
The paper proposes the Layered Governance Architecture (LGA), a novel four-layer framework, and develops a bilingual benchmark to evaluate its effectiveness against various threats.
Findings
Layer 2 intent verification intercepts 93-98.5% of malicious calls.
The full pipeline achieves 96% threat interception with approximately 980 ms latency.
The framework generalizes well, achieving 99-100% interception on external benchmarks.
Abstract
Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and uncontrolled tool invocation -- that existing guardrails fail to address systematically. In this work, we propose the Layered Governance Architecture (LGA), a four-layer framework comprising execution sandboxing (L1), intent verification (L2), zero-trust inter-agent authorization (L3), and immutable audit logging (L4). To evaluate LGA, we construct a bilingual benchmark (Chinese original, English via machine translation) of 1,081 tool-call samples -- covering prompt injection, RAG poisoning, and malicious skill plugins -- and apply it to OpenClaw, a representative open-source agent framework. Experimental results on Layer 2 intent verification with four local LLM judges (Qwen3.5-4B, Llama-3.1-8B, Qwen3.5-9B, Qwen2.5-14B) and one cloud…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Access Control and Trust
