TL;DR
AgentWard is a comprehensive security architecture designed to protect autonomous AI agents throughout their lifecycle by integrating multiple defense layers and cross-layer coordination.
Contribution
The paper introduces a novel, lifecycle-oriented security architecture with stage-specific controls and a prototype implementation for autonomous AI agents.
Findings
Prototype implementation on OpenClaw demonstrates practical feasibility.
The architecture effectively intercepts threats across different stages.
Provides a blueprint for runtime security in autonomous AI systems.
Abstract
Autonomous AI agents extend large language models into full runtime systems that load skills, ingest external content, maintain memory, plan multi-step actions, and invoke privileged tools. In such systems, security failures rarely remain confined to a single interface; instead, they can propagate across initialization, input processing, memory, decision-making, and execution, often becoming apparent only when harmful effects materialize in the environment. This paper presents AgentWard, a lifecycle-oriented, defense-in-depth architecture that systematically organizes protection across these five stages. AgentWard integrates stage-specific, heterogeneous controls with cross-layer coordination, enabling threats to be intercepted along their propagation paths while safeguarding critical assets. We detail the design rationale and architecture of five coordinated protection layers, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
