InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Yuhang Liu; Zeyu Liu; Shuanghe Zhu; Pengxiang Li; Congkai Xie; Jiasheng Wang; Xavier Hu; Xiaotian Han; Jianbo Yuan; Xinyao Wang; Shengyu Zhang; Hongxia Yang; Fei Wu

arXiv:2508.05731·cs.AI·December 9, 2025

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Yuhang Liu, Zeyu Liu, Shuanghe Zhu, Pengxiang Li, Congkai Xie, Jiasheng Wang, Xavier Hu, Xiaotian Han, Jianbo Yuan, Xinyao Wang, Shengyu Zhang, Hongxia Yang, Fei Wu

PDF

Open Access 2 Models 1 Datasets 1 Video

TL;DR

This paper introduces AEPO, a new policy optimization framework that enhances GUI grounding by improving semantic alignment through broader exploration, leading to state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes AEPO, an innovative exploration strategy with a theoretically grounded reward, significantly advancing semantic alignment in GUI grounding tasks.

Findings

01

Achieved up to 9.0% improvement over baseline

02

Established new state-of-the-art on multiple benchmarks

03

Demonstrated effectiveness of broader exploration in semantic tasks

Abstract

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevent models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

InfiX-ai/omniact_grounding_filtered
dataset· 37 dl
37 dl

Videos

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization· underline

Taxonomy

TopicsAugmented Reality Applications · Context-Aware Activity Recognition Systems