MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, Zhidan Liu, Steven Hoi, and Yue Wang

TL;DR
MobileWorld is a new, more challenging benchmark for evaluating autonomous mobile agents, emphasizing real-world workflows, multi-application tasks, and user interactions, revealing significant performance gaps and research opportunities.
Contribution
The paper introduces MobileWorld, a comprehensive benchmark with diverse tasks and novel scenarios, addressing limitations of existing benchmarks like AndroidWorld.
Findings
MobileWorld features nearly twice as many steps per task as AndroidWorld.
Agents show a sharp performance drop on MobileWorld, with success rates below 52%.
The benchmark enables evaluation of user-aware, hybrid-tool scenarios.
Abstract
Among existing online mobile-use benchmarks, AndroidWorld has emerged as the dominant benchmark due to its reproducible environment and deterministic evaluation; however, recent agents achieving over 90% success rates indicate its saturation and motivate the need for a more challenging benchmark. In addition, its environment lacks key application categories, such as e-commerce and enterprise communication, and does not reflect realistic mobile-use scenarios characterized by vague user instructions and hybrid tool usage. We introduce MobileWorld, a substantially more challenging benchmark designed to reflect real-world usage through 201 tasks across 20 applications. MobileWorld derives its difficulty from an emphasis on long-horizon, cross-application workflows, requiring nearly twice as many completion steps on average (27.8 vs. 14.3) and featuring a significantly higher proportion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Advanced Software Engineering Methodologies · Multi-Agent Systems and Negotiation
