UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao; Guozhi Wang; Yuxiang Chai; Zimu Lu; Weifeng Lin; Hao He; Lue Fan; Liuyang Bian; Rui Hu; Liang Liu; Shuai Ren; Yafei Wen; Xiaoxin Chen; Aojun Zhou; Hongsheng Li

arXiv:2505.21496·cs.CL·May 28, 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li

PDF

Open Access 1 Repo 2 Models 2 Datasets

TL;DR

UI-Genie is a self-improving framework that enhances GUI agents by combining a reward model and iterative data generation, leading to state-of-the-art performance in mobile GUI tasks.

Contribution

The paper introduces UI-Genie, a novel self-improving approach with a new reward model and data generation pipeline for boosting MLLM-based GUI agents.

Findings

01

Achieves state-of-the-art results on multiple GUI benchmarks.

02

Generates high-quality synthetic trajectories without manual labeling.

03

Demonstrates effective self-improvement through iterative data-model enhancement.

Abstract

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

euphoria16/ui-genie
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Agent-Based Network Management · Context-Aware Activity Recognition Systems · Peer-to-Peer Network Technologies