The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense
Qianlong Lan, Anuj Kaul

TL;DR
The paper introduces the Cognitive Firewall, a hybrid edge-cloud system that effectively defends browser-based AI agents against indirect prompt injection attacks, significantly reducing attack success rates while maintaining low latency.
Contribution
It proposes a novel three-stage split-compute architecture combining local and cloud defenses to enhance security and efficiency of browser-based LLM agents.
Findings
Reduces attack success rate to below 1%
Achieves 17,000x latency improvement over cloud-only systems
Effectively detects 86.9% of semantic attacks with edge defenses
Abstract
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Spam and Phishing Detection
