SafeAgent: A Runtime Protection Architecture for Agentic Systems
Hailin Liu, Eugene Ilyushin, Jie Ni, and Min Zhu

TL;DR
SafeAgent is a runtime security architecture for LLM agents that enhances robustness against prompt-injection attacks by managing interaction trajectories and separating governance from risk reasoning.
Contribution
It introduces a novel architecture with a runtime controller and context-aware decision core to improve agent safety and robustness.
Findings
SafeAgent outperforms baseline and guardrail methods in robustness.
Experiments on ASB and InjecAgent demonstrate improved security.
Recovery confidence and policy weighting influence safety-utility trade-offs.
Abstract
Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and persistent context, making input-output filtering alone insufficient for reliable protection. This paper presents SafeAgent, a runtime security architecture that treats agent safety as a stateful decision problem over evolving interaction trajectories. The proposed design separates execution governance from semantic risk reasoning through two coordinated components: a runtime controller that mediates actions around the agent loop and a context-aware decision core that operates over persistent session state. The core is formalized as a context-aware advanced machine intelligence and instantiated through operators for risk encoding, utility-cost evaluation, consequence modeling, policy arbitration, and state synchronization. Experiments on Agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
