UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Shuquan Lian; Yuhang Wu; Jia Ma; Yifan Ding; Zihan Song; Bingqi Chen; Xiawu Zheng; Hui Li; Rongrong Ji

arXiv:2507.22025·cs.AI·April 9, 2026

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Shuquan Lian, Yuhang Wu, Jia Ma, Yifan Ding, Zihan Song, Bingqi Chen, Xiawu Zheng, Hui Li, Rongrong Ji

PDF

1 Repo 2 Models 1 Datasets

TL;DR

UI-AGILE enhances GUI agents by improving training with a reward suite and cropping strategies, and inference with decomposed grounding, achieving state-of-the-art accuracy on benchmark datasets.

Contribution

The paper introduces novel training and inference techniques for GUI agents, significantly improving grounding accuracy and robustness over existing methods.

Findings

01

Achieves 23% higher grounding accuracy on ScreenSpot-Pro benchmark.

02

Introduces a continuous reward and cropping strategy for better training.

03

Decomposed grounding improves accuracy on high-resolution displays.

Abstract

The emergence of Multimodal Large Language Models (MLLMs) has driven significant advances in Graphical User Interface (GUI) agent capabilities. Nevertheless, existing GUI agent training and inference techniques still suffer from a dilemma for reasoning designs, ineffective reward, and visual noise. To address these issues, we introduce UI-AGILE for enhancing GUI agents at both training and inference. For training, we propose a suite of improvements to the Supervised Fine-Tuning (SFT) process: 1) a continuous reward function to incentivize high-precision grounding; 2) a ``Simple Thinking'' reward to balance planning with speed and grounding accuracy; and 3) a cropping-based resampling strategy to mitigate the sparse reward problem and improve learning on complex tasks. For inference, we present decomposed grounding with selection to dramatically improve grounding accuracy on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KDEGroup/UI-AGILE
github

Models

Datasets

KDEGroup/UI-AGILE-Data
dataset· 49 dl
49 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.