Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

Jihao Gu; Qihang Ai; Yingyao Wang; Pi Bu; Jingxuan Xing; Zekun Zhu; Wei Jiang; Ziming Wang; Yingxiu Zhao; Ming-Liang Zhang; Jun Song; Yuning Jiang; Bo Zheng

arXiv:2506.20332·cs.AI·April 28, 2026

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, Jun Song, Yuning Jiang, Bo Zheng

PDF

1 Repo 2 Datasets

TL;DR

Mobile-R1 introduces a hierarchical training approach for vision-language mobile agents, improving exploration and self-correction capabilities, and provides a new Chinese GUI dataset and benchmark.

Contribution

The paper presents a systematic hierarchical training recipe and a Chinese GUI dataset to enhance VLM-based mobile agents' capabilities.

Findings

01

Hierarchical curriculum improves exploration and self-correction.

02

Proposed dataset covers 28 applications with 24,521 annotations.

03

Open-sourced resources include dataset, benchmark, model weights, and code.

Abstract

Vision-language model-based mobile agents have gained the ability to understand complex instructions and mobile screenshots, benefiting from reinforcement learning paradigms like Group Relative Policy Optimization (GRPO). However, existing approaches centers on offline training or local action-level rewards often trap agents in local optima, hindering effective exploration and error correction with the environment. Crucially, we find that directly applying task-level rewards often leads to convergence difficulties due to the sparse nature of GUI interactions. To address these challenges, we present \textbf{Mobile-R1}, a systematic training recipe that bridges atomic action execution and strategic task completion. We propose a hierarchical curriculum consisting of three stages: (1) format alignment for reasoning structure, (2) on-policy exploration with verifiable action feedback to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://mobile-r1.github.io/Mobile-R1
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.