MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Yucheng Shi; Wenhao Yu; Zaitang Li; Yonglin Wang; Hongming Zhang; Ninghao Liu; Haitao Mi; Dong Yu

arXiv:2507.05720·cs.LG·July 9, 2025

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, Dong Yu

PDF

Open Access 3 Reviews

TL;DR

MobileGUI-RL introduces an online training framework for mobile GUI agents that enhances scalability and robustness by using self-exploration and adaptive reinforcement learning, outperforming offline methods.

Contribution

The paper presents MobileGUI-RL, a novel online training framework for GUI agents that synthesizes learnable tasks and adapts reinforcement learning for better generalization.

Findings

01

Consistent performance improvements on three mobile-agent benchmarks.

02

Effective use of self-exploration for curriculum synthesis.

03

Enhanced robustness and scalability over offline training methods.

Abstract

Recently, there has been a surge of vision-based GUI agents designed to automate everyday mobile and web tasks. These agents interpret raw GUI screenshots and autonomously decide where to click, scroll, or type, which bypasses handcrafted rules and app-specific APIs. However, most existing methods trained GUI agent in the offline environment using pre-collected trajectories. This approach limits scalability, causes overfitting to specific UI templates, and leads to brittle policies when faced with unseen environment. We present MobileGUI-RL, a scalable framework that trains GUI agent in online environment. MobileGUI-RL contains two key components. It (i) synthesizes a curriculum of learnable tasks through self-exploration and filtering, and (ii) adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards that balance task success and execution efficiency.…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

- The experiment results show improvements for both Qwen2.5-VL 7b/32b. - The paper shifts to online RL with mentioning scalable data collection environments. - The case study offers an intuitive illustration of the agent fixed the failure that the pre-trained LLMs made.

Weaknesses

- Several directly relevant recent works, such as DigiRL and DistRL are not compared, despite clear overlap in online RL settings and robust GUI agents. This absence undermines the claims of uniqueness and makes positioning statements in Section 2 unsupported. - Insufficient details on key components are missing. For example, (1) the scalable environment is not well described. How to ‘align compute-intensive environment simulation with CPU and model training with GPUs’? (2) The text-based world

Reviewer 02Rating 4Confidence 4

Strengths

1) I like how the authors designed this entire framework: the automatic curriculum design ad reward design. 2) I can see the potential of someone with physical disabilities use voice-assistance to control their mobile phone as long as safety is ensured. The algorithm proposed in this paper can be highly useful to personalize an AI model for individual users. 3) The paper shows that the proposed framework achieves significant improvements over prior techniques in terms of success rates and sampl

Weaknesses

**Major (my reason for not providing higher score)** 1) No examples of the inference of the trained system. The example shown in the appendix in the section D does not mention the details of the user request or explain whether the trajectory generated is of a trained AI model. Also, the paper does not provide any details about the identified tasks through their method of synthetic task generation. 2) There is no comparison of proposed trajectory-aware MobGRPO against standard GRPO. If I understa

Reviewer 03Rating 2Confidence 4

Strengths

- This paper introduces a new approach to training GUI agents in online environments using reinforcement learning, which addresses the shortcomings of previous offline methods. - This paper designs a scalable training infrastructure with batched virtual execution on multiple Android emulators, enabling high-throughput, asynchronous data collection. This design improves both sample efficiency and policy robustness, which is an important technical strength for large-scale deployment. - Extensive e

Weaknesses

- The study contains several details not fully disclosed. Please refer to the question section for specifics. - The reproducibility of the results is relatively low and requires further enhancement of the experimental repeatability. - There is a lack of more ablation studies. Please refer to the question section for details.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Multimodal Machine Learning Applications