AgentWebBench: Benchmarking Multi-Agent Coordination in Agentic Web

Shanshan Zhong; Kate Shen; Chenyan Xiong

arXiv:2604.10938·cs.MA·April 14, 2026

AgentWebBench: Benchmarking Multi-Agent Coordination in Agentic Web

Shanshan Zhong, Kate Shen, Chenyan Xiong

PDF

1 Repo

TL;DR

AgentWebBench is a new benchmark for evaluating multi-agent coordination in the emerging Agentic Web paradigm, analyzing web information synthesis and interaction strategies across various models and tasks.

Contribution

It introduces a comprehensive benchmark for multi-agent web interaction, evaluates multiple models and strategies, and provides insights into the properties and challenges of decentralized web information access.

Findings

01

Multi-agent coordination lags behind centralized retrieval but improves with model scale.

02

On question answering, multi-agent approaches can outperform centralized retrieval.

03

Decentralized access concentrates traffic and benefits from better planning and interaction scaling.

Abstract

Agentic Web is an emerging paradigm where autonomous agents help users use online information. As the paradigm develops, content providers are also deploying agents to manage their data and serve it through controlled interfaces. This shift moves information access from centralized retrieval to decentralized coordination. To study this setting, we introduce AgentWebBench, a benchmark that evaluates how well a user agent synthesizes answers by interacting with website-specific content agents. We evaluate four tasks that cover common web information needs, spanning ranked retrieval (web search, web recommendation) and open-ended synthesis (question answering, deep research). Across seven advanced LLMs and three coordination strategies, multi-agent coordination generally lags behind centralized retrieval as expected, because user agent cannot directly access the corpus, but the gap shrinks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cxcscmu/AgentWebBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.