Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Zhilin Liu; Ye Huang; Ting Xie; Ruizhi Zhang; Wen Li; Lixin Duan

arXiv:2604.19750·cs.SE·April 23, 2026

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Zhilin Liu, Ye Huang, Ting Xie, Ruizhi Zhang, Wen Li, Lixin Duan

PDF

TL;DR

This paper introduces a new benchmark and a vision-feedback system for improving GUI code generation and debugging using visual information, addressing limitations of text-based methods.

Contribution

It presents InteractGUI Bench for evaluating GUI tasks and VF-Coder, a visual feedback system that enhances debugging by perceiving visual cues.

Findings

01

VF-Coder improves success rate from 21.68% to 28.29%.

02

VF-Coder raises visual score from 0.4284 to 0.5584.

03

Benchmark enables fine-grained evaluation of GUI interaction and visual structure.

Abstract

Recent advances in Large Language Model (LLM)-based agents have shown remarkable progress in code generation. However, current agent methods mainly rely on text-output-based feedback (e.g. command-line outputs) for multi-round debugging and struggle in graphical user interface (GUI) that involve visual information. This is mainly due to two limitations: 1) GUI programs are event-driven, yet existing methods cannot simulate user interactions to trigger GUI element logic 2) GUI programs possess visual attributes, making it difficult for text-based approaches to assess whether the rendered interface meets user needs. To systematically address these challenges, we first introduce InteractGUI Bench, a novel benchmark comprising 984 commonly used real-world desktop GUI application tasks designed for fine-grained evaluation of both interaction logic and visual structure. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.