VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han; Haoqin Tu; Zijun Wang; Haoyue Dai; Yiyang Zhou; Nancy Lau; Alvaro A. Cardenas; Yuhui Xu; Ran Xu; Caiming Xiong; Zeyu Zheng; Huaxiu Yao; Yuyin Zhou; Cihang Xie

arXiv:2604.21375·cs.CL·April 27, 2026

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

PDF

1 Repo

TL;DR

VLAA-GUI is a modular framework for GUI automation that improves success verification, recovery from failures, and online search capabilities, leading to top performance on benchmark tasks.

Contribution

The paper introduces VLAA-GUI, a novel modular framework with integrated components for stopping, recovering, and searching, enhancing GUI agent robustness and effectiveness.

Findings

01

Achieved top performance on OSWorld (77.5%) and WindowsAgentArena (61.0%).

02

Three backbones surpass human performance (72.4%) on OSWorld in a single pass.

03

All proposed components improve backbone performance and reduce wasted steps.

Abstract

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated components that guide the system on when to Stop, Recover, and Search. First, a mandatory Completeness Verifier enforces UI-observable success criteria and verification at every finish step -- with an agent-level verifier that cross-examines completion claims with decision rules, rejecting those lacking direct visual evidence. Second, a mandatory Loop Breaker provides multi-tier filtering: switching interaction mode after repeated failures, forcing strategy changes after persistent screen-state recurrence, and binding reflection signals to strategy shifts. Third, an on-demand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsc-vlaa/VLAA-GUI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.