MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang,, Tianchen Min, Wei Chen, Shoufa Chen

TL;DR
MobileAgentBench is a new benchmark designed to efficiently evaluate mobile LLM agents across diverse tasks, addressing the challenge of benchmarking due to app complexity and vague action definitions.
Contribution
It introduces a comprehensive, user-friendly benchmark with 100 tasks across multiple apps, enabling systematic comparison of existing mobile agents.
Findings
Evaluated multiple mobile agents to compare performance systematically.
Provided a publicly accessible benchmark platform for future research.
Facilitated standardized assessment of mobile LLM agents.
Abstract
Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexhaustible states of apps and the vague definition of feasible action sequences. To address this challenge, we propose an efficient and user-friendly benchmark, MobileAgentBench, designed to alleviate the burden of extensive manual testing. We initially define 100 tasks across 10 open-source apps, categorized by multiple levels of difficulty. Subsequently, we evaluate several existing mobile agents, including AppAgent and MobileAgent, to thoroughly and systematically compare their performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Peer-to-Peer Network Technologies · Distributed systems and fault tolerance
