From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
Segev Shlomov, Ben wiesel, Aviad Sela, Ido Levy, Liane Galanti, Roy Abitbol

TL;DR
This paper analyzes web-based agents by separating planning and grounding components, revealing that planning is the main bottleneck limiting performance, and introduces benchmarks to identify and address these challenges.
Contribution
It refines the analysis of web agents by benchmarking planning and grounding separately, highlighting planning as the key bottleneck and providing practical improvement suggestions.
Findings
Grounding is not a significant bottleneck with current techniques.
Planning is the primary source of performance degradation.
New benchmarks effectively identify bottlenecks in web agents.
Abstract
General web-based agents are increasingly essential for interacting with complex web environments, yet their performance in real-world web applications remains poor, yielding extremely low accuracy even with state-of-the-art frontier models. We observe that these agents can be decomposed into two primary components: Planning and Grounding. Yet, most existing research treats these agents as black boxes, focusing on end-to-end evaluations which hinder meaningful improvements. We sharpen the distinction between the planning and grounding components and conduct a novel analysis by refining experiments on the Mind2Web dataset. Our work proposes a new benchmark for each of the components separately, identifying the bottlenecks and pain points that limit agent performance. Contrary to prevalent assumptions, our findings suggest that grounding is not a significant bottleneck and can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge · Mobile Agent-Based Network Management
