AgentWall: A Runtime Safety Layer for Local AI Agents

Ashwin Aravind

arXiv:2605.16265·cs.AI·May 19, 2026

AgentWall: A Runtime Safety Layer for Local AI Agents

Ashwin Aravind

PDF

1 Repo

TL;DR

AgentWall is a runtime safety layer for local AI agents that intercepts, evaluates, and controls agent actions to prevent unsafe behavior, ensuring security and auditability.

Contribution

It introduces a novel runtime safety and observability layer that enforces policies on local AI agent actions with high accuracy and low overhead.

Findings

01

92.9% policy enforcement accuracy

02

Sub-millisecond overhead in 14 benchmarks

03

Works across multiple AI agent platforms

Abstract

The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the consequences of unsafe or adversarially manipulated behavior become immediate and tangible. Existing AI safety work has focused primarily on model alignment and input filtering, but these approaches do not address what happens at the moment an agent's intent becomes a real action on a real machine. This gap is especially acute in local environments, where developers run agents against their own filesystems, credentials, and infrastructure with little runtime control. This paper introduces AgentWall, a runtime safety and observability layer for local AI agents. AgentWall intercepts every proposed agent action before it reaches the host…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agentwall/Agentwall
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.