GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang

TL;DR
This paper introduces GUI-G$^2$, a Gaussian reward framework for GUI grounding that models spatial interactions as continuous distributions, significantly improving localization accuracy and robustness over binary reward methods.
Contribution
The paper proposes a novel Gaussian reward modeling approach for GUI grounding, transforming sparse binary rewards into dense continuous signals for better spatial reasoning.
Findings
Outperforms state-of-the-art UI-TARS-72B by up to 24.7% on ScreenSpot-Pro.
Provides more robust and generalized GUI element localization.
Enhances spatial reasoning in GUI interaction tasks.
Abstract
Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Robot Manipulation and Learning
