The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin,, Massimo Caccia, L\'eo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar, Omidi Shayegan, Lawrence Keunho Jang, Xing Han L\`u, Ori Yoran, Dehan Kong,, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig

TL;DR
This paper introduces an extended BrowserGym ecosystem for web agent research, unifying benchmarks and providing tools for standardized evaluation, demonstrated through large-scale experiments comparing top LLMs across multiple web benchmarks.
Contribution
It extends BrowserGym to unify existing benchmarks and includes AgentLab for agent development, testing, and analysis, enabling consistent evaluation and comprehensive experiment management.
Findings
Claude-3.5-Sonnet outperforms others on most benchmarks
GPT-4o excels in vision-related tasks
Building robust web agents remains challenging due to environment complexity
Abstract
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) introduced BrowserGym which aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature and includes AgentLab, a complementary framework that aids in agent creation, testing, and analysis. Our proposed ecosystem offers flexibility for integrating new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Scientific Computing and Data Management
MethodsFragmentation
