GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents
Chen Chen, Jiawei Shao, Dakuan Lu, Haoyi Hu, Xiangcheng Liu, Hantao Yao, Wu Liu

TL;DR
GUI-Eyes introduces an active perception framework for GUI agents that strategically uses visual tools and staged reasoning to improve accuracy and data efficiency in GUI understanding tasks.
Contribution
The paper proposes a novel RL-based framework with a two-stage perception strategy and a dense reward function for active visual perception in GUI tasks.
Findings
Achieves 44.8% grounding accuracy on ScreenSpot-Pro with only 3k labeled samples.
Outperforms supervised and RL baselines significantly.
Demonstrates the effectiveness of tool-aware active perception for GUI understanding.
Abstract
Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively determine when, whether, and how to observe the interface. We present GUI-Eyes, a reinforcement learning framework for active visual perception in GUI tasks. To acquire more informative observations, the agent learns to make strategic decisions on both whether and how to invoke visual tools, such as cropping or zooming, within a two-stage reasoning process. To support this behavior, we introduce a progressive perception strategy that decomposes decision-making into coarse exploration and fine-grained grounding, coordinated by a two-level policy. In addition, we design a spatially continuous reward function tailored to tool usage, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
