Agent-Sentry: Bounding LLM Agents via Execution Provenance
Rohan Sequeira, Stavros Damianakis, Umar Iqbal, and Konstantinos Psounis

TL;DR
Agent Sentry is a runtime defense system that monitors and flags potentially malicious actions of LLM agents by learning from legitimate execution patterns and applying layered checks, enhancing security without altering the agents.
Contribution
It introduces a novel layered approach combining structural, allowlist, and LLM-based checks to effectively detect and block malicious actions in LLM agents.
Findings
Blocked 94.3% of successful injections in experiments.
Allowed 95.1% of benign executions without modifications.
Effectively distinguished between legitimate and malicious actions.
Abstract
Agentic computing systems, while immensely capable, raise serious security, privacy, and safety concerns. A key issue is that the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is challenging to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. We present \emph{Agent Sentry}, a runtime defense that learns a bound on an agent's benign execution from prior legitimate executions and flags any action that falls outside this bound. Agent Sentry layers three complementary checks: a structural classifier over the sequence of actions and the provenance of each function's arguments; a deterministic allowlist check over sensitive argument values; and an LLM judge,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
