SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu; Eugene Ilyushin; Jie Ni; and Min Zhu

arXiv:2604.17562·cs.AI·April 21, 2026

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni, and Min Zhu

PDF

TL;DR

SafeAgent is a runtime security architecture for LLM agents that enhances robustness against prompt-injection attacks by managing interaction trajectories and separating governance from risk reasoning.

Contribution

It introduces a novel architecture with a runtime controller and context-aware decision core to improve agent safety and robustness.

Findings

01

SafeAgent outperforms baseline and guardrail methods in robustness.

02

Experiments on ASB and InjecAgent demonstrate improved security.

03

Recovery confidence and policy weighting influence safety-utility trade-offs.

Abstract

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and persistent context, making input-output filtering alone insufficient for reliable protection. This paper presents SafeAgent, a runtime security architecture that treats agent safety as a stateful decision problem over evolving interaction trajectories. The proposed design separates execution governance from semantic risk reasoning through two coordinated components: a runtime controller that mediates actions around the agent loop and a context-aware decision core that operates over persistent session state. The core is formalized as a context-aware advanced machine intelligence and instantiated through operators for risk encoding, utility-cost evaluation, consequence modeling, policy arbitration, and state synchronization. Experiments on Agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.