See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

Himangi Mittal; Gaurav Mittal; Nelson Daniel Troncoso; Yu Hu

arXiv:2604.13019·cs.CV·April 15, 2026

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

PDF

1 Repo

TL;DR

This paper introduces a multi-turn, visual feedback-driven approach for GUI grounding in coding environments, significantly improving precision over single-shot methods through iterative refinement.

Contribution

It presents a novel iterative refinement mechanism for GUI grounding that enhances accuracy and robustness in dense, dynamic coding interfaces.

Findings

01

Multi-turn refinement outperforms single-shot models in click accuracy.

02

The approach adapts to dynamic UI changes through visual feedback.

03

Significant improvements in task success rate across multiple benchmarks.

Abstract

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing approaches typically rely on single-shot coordinate prediction, which lacks a mechanism for error correction and often fails in high-density interfaces. In this technical report, we conduct an empirical study of pixel-precise cursor localization in coding environments. Instead of a single-step execution, our agent engages in an iterative refinement process, utilizing visual feedback from previous attempts to reach the target element. This closed-loop grounding mechanism allows the agent to self-correct displacement errors and adapt to dynamic UI changes. We evaluate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/precision-cua-bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.