Building Browser Agents: Architecture, Security, and Practical Solutions
Aram Vardanyan

TL;DR
This paper analyzes the architecture and security challenges of autonomous browser agents, proposing specialized tools and hybrid context management to improve safety and performance in real-world web interaction tasks.
Contribution
It introduces a security-focused architectural approach and hybrid context management techniques that significantly enhance the reliability and safety of production browser agents.
Findings
Achieved 85% success on WebGames benchmark, surpassing prior agents.
Identified prompt injection attacks as a key security vulnerability.
Argued for specialized tools over general browsing AI for safety.
Abstract
Browser agents enable autonomous web interaction but face critical reliability and security challenges in production. This paper presents findings from building and operating a production browser agent. The analysis examines where current approaches fail and what prevents safe autonomous operation. The fundamental insight: model capability does not limit agent performance; architectural decisions determine success or failure. Security analysis of real-world incidents reveals prompt injection attacks make general-purpose autonomous operation fundamentally unsafe. The paper argues against developing general browsing intelligence in favor of specialized tools with programmatic constraints, where safety boundaries are enforced through code instead of large language model (LLM) reasoning. Through hybrid context management combining accessibility tree snapshots with selective vision,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Accessibility for Disabilities · Web Application Security Vulnerabilities · Speech and dialogue systems
