X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System

Peng Wang; Ruihan Tao; Qiguang Chen; Mengkang Hu; Libo Qin

arXiv:2505.15372·cs.CL·May 22, 2025

X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System

Peng Wang, Ruihan Tao, Qiguang Chen, Mengkang Hu, Libo Qin

PDF

Open Access 1 Repo

TL;DR

X-WebAgentBench is a new multilingual web benchmark designed to evaluate and improve the planning and interaction abilities of language agents across diverse languages, addressing a significant gap in current research focused mainly on English.

Contribution

It introduces a comprehensive multilingual interactive web benchmark and evaluates various LLMs and cross-lingual methods, highlighting the challenges in achieving high performance across languages.

Findings

01

Advanced models like GPT-4o still underperform in multilingual tasks.

02

Cross-lingual alignment methods do not fully bridge the performance gap.

03

The benchmark facilitates future research in global multilingual agent development.

Abstract

Recently, large language model (LLM)-based agents have achieved significant success in interactive environments, attracting significant academic and industrial attention. Despite these advancements, current research predominantly focuses on English scenarios. In reality, there are over 7,000 languages worldwide, all of which demand access to comparable agentic services. Nevertheless, the development of language agents remains inadequate for meeting the diverse requirements of multilingual agentic applications. To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence. Additionally, we assess the performance of various LLMs and cross-lingual alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WPENGxs/X-WebAgentBench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies