TL;DR
AgentWebBench is a new benchmark for evaluating multi-agent coordination in the emerging Agentic Web paradigm, analyzing web information synthesis and interaction strategies across various models and tasks.
Contribution
It introduces a comprehensive benchmark for multi-agent web interaction, evaluates multiple models and strategies, and provides insights into the properties and challenges of decentralized web information access.
Findings
Multi-agent coordination lags behind centralized retrieval but improves with model scale.
On question answering, multi-agent approaches can outperform centralized retrieval.
Decentralized access concentrates traffic and benefits from better planning and interaction scaling.
Abstract
Agentic Web is an emerging paradigm where autonomous agents help users use online information. As the paradigm develops, content providers are also deploying agents to manage their data and serve it through controlled interfaces. This shift moves information access from centralized retrieval to decentralized coordination. To study this setting, we introduce AgentWebBench, a benchmark that evaluates how well a user agent synthesizes answers by interacting with website-specific content agents. We evaluate four tasks that cover common web information needs, spanning ranked retrieval (web search, web recommendation) and open-ended synthesis (question answering, deep research). Across seven advanced LLMs and three coordination strategies, multi-agent coordination generally lags behind centralized retrieval as expected, because user agent cannot directly access the corpus, but the gap shrinks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
