TL;DR
OpenMobile introduces an open-source framework for synthesizing task instructions and trajectories to improve mobile agent performance, achieving state-of-the-art results on benchmarks.
Contribution
It presents a scalable task synthesis pipeline and a policy-switching strategy for trajectory generation, enhancing mobile agent training and evaluation.
Findings
Agents trained on OpenMobile data outperform existing open-data methods.
Fine-tuned models reach 51.7% and 64.7% success rates on AndroidWorld.
Analysis shows performance gains are due to broad functionality coverage.
Abstract
Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
