Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents
Hung Dang

TL;DR
This paper introduces extsc{Codename}, a behavioral firewall for structured-workflow AI agents that uses sequence-based telemetry analysis to significantly reduce attack success rates while maintaining low latency and benign task failure.
Contribution
extsc{Codename} is a novel, efficient, and effective behavioral anomaly detection system that compiles verified benign tool-call sequences into a deterministic automaton for runtime enforcement.
Findings
extsc{Codename} reduces attack success rate to 2.2% in structured workflows.
It outperforms state-of-the-art stateless scanners like Aegis in attack detection.
It introduces minimal latency of 2.2 ms per call, maintaining low benign failure rates.
Abstract
Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
