MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

Yunfei Feng; Xi Zhao; Cheng Zhang; Dahu Feng; Daolin Cheng; Jianqi Yu; Yubin Xia; Erhu Feng

arXiv:2604.09587·cs.AI·April 14, 2026

MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

Yunfei Feng, Xi Zhao, Cheng Zhang, Dahu Feng, Daolin Cheng, Jianqi Yu, Yubin Xia, Erhu Feng

PDF

TL;DR

MobiFlow is a new benchmarking framework for mobile agents that better reflects real-world scenarios by using multi-trajectory fusion to evaluate third-party app interactions.

Contribution

It introduces MobiFlow, an evaluation framework that aligns more closely with real-world mobile agent tasks by supporting dynamic interactions and compressing state space.

Findings

01

MobiFlow covers 20 third-party apps with 240 real-world tasks.

02

It achieves higher alignment with human assessments compared to AndroidWorld.

03

Supports dynamic interaction and efficient state space compression.

Abstract

Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by connecting to a system-level Android emulator and provide evaluation signals based on the state of system resources. In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to determine whether a task has succeeded, leading to a mismatch between benchmarks and real-world usage and making it difficult to evaluate model performance accurately. To address these issues, we propose MobiFlow, an evaluation framework built on tasks drawn from arbitrary third-party applications. Using an efficient graph-construction algorithm based on multi-trajectory fusion, MobiFlow can effectively compress the state space, support dynamic interaction, and better align with real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.