GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Fei Tang; Zhangxuan Gu; Zhengxi Lu; Xuyang Liu; Shuheng Shen; Changhua Meng; Wen Wang; Wenqi Zhang; Yongliang Shen; Weiming Lu; Jun Xiao; Yueting Zhuang

arXiv:2507.15846·cs.LG·July 29, 2025

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang

PDF

Open Access 2 Models

TL;DR

This paper introduces GUI-G$^2$, a Gaussian reward framework for GUI grounding that models spatial interactions as continuous distributions, significantly improving localization accuracy and robustness over binary reward methods.

Contribution

The paper proposes a novel Gaussian reward modeling approach for GUI grounding, transforming sparse binary rewards into dense continuous signals for better spatial reasoning.

Findings

01

Outperforms state-of-the-art UI-TARS-72B by up to 24.7% on ScreenSpot-Pro.

02

Provides more robust and generalized GUI element localization.

03

Enhances spatial reasoning in GUI interaction tasks.

Abstract

Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G $^{2}$ ), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G $^{2}$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Robot Manipulation and Learning