X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System
Peng Wang, Ruihan Tao, Qiguang Chen, Mengkang Hu, Libo Qin

TL;DR
X-WebAgentBench is a new multilingual web benchmark designed to evaluate and improve the planning and interaction abilities of language agents across diverse languages, addressing a significant gap in current research focused mainly on English.
Contribution
It introduces a comprehensive multilingual interactive web benchmark and evaluates various LLMs and cross-lingual methods, highlighting the challenges in achieving high performance across languages.
Findings
Advanced models like GPT-4o still underperform in multilingual tasks.
Cross-lingual alignment methods do not fully bridge the performance gap.
The benchmark facilitates future research in global multilingual agent development.
Abstract
Recently, large language model (LLM)-based agents have achieved significant success in interactive environments, attracting significant academic and industrial attention. Despite these advancements, current research predominantly focuses on English scenarios. In reality, there are over 7,000 languages worldwide, all of which demand access to comparable agentic services. Nevertheless, the development of language agents remains inadequate for meeting the diverse requirements of multilingual agentic applications. To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence. Additionally, we assess the performance of various LLMs and cross-lingual alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies
