AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu

TL;DR
AgentCPM-GUI is a robust, multilingual mobile GUI agent trained with grounding-aware pre-training, supervised fine-tuning, and reinforcement learning, achieving state-of-the-art performance on multiple benchmarks.
Contribution
The paper introduces AgentCPM-GUI, a novel 8B-parameter GUI agent with a comprehensive training pipeline and a compact action space for mobile environments, addressing prior limitations.
Findings
Achieves 96.9% Type-Match and 91.3% Exact-Match on benchmarks.
Outperforms existing models on five public benchmarks and a new Chinese GUI benchmark.
Demonstrates effective multilingual and cross-scenario GUI interaction.
Abstract
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, which hinders the learning of precise grounding and planning. Models trained purely by imitation tend to overfit to seen interface patterns and fail to generalize in unfamiliar scenarios. Moreover, most prior work focuses on English interfaces while overlooks the growing diversity of non-English applications such as those in the Chinese mobile ecosystem. In this work, we present AgentCPM-GUI, an 8B-parameter GUI agent built for robust and efficient on-device GUI interaction. Our training pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMobile Agent-Based Network Management · Multi-Agent Systems and Negotiation · Transportation and Mobility Innovations
