GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

Rui Xie; Zhi Gao; Chenrui Shi; Zirui Shang; Lu Chen; and Qing Li

arXiv:2603.26266·cs.AI·April 1, 2026

GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

Rui Xie, Zhi Gao, Chenrui Shi, Zirui Shang, Lu Chen, and Qing Li

PDF

TL;DR

GUIDE is a training-free framework that enhances GUI agents by autonomously acquiring domain-specific knowledge from web tutorial videos, significantly reducing domain bias and improving task performance.

Contribution

GUIDE introduces a novel retrieval-augmented annotation pipeline that enables GUI agents to learn domain-specific expertise without retraining, improving their real-world applicability.

Findings

01

GUIDE improves GUI agent performance by over 5% on OSWorld benchmarks.

02

It reduces execution steps without modifying model parameters.

03

The framework demonstrates broad applicability as a plug-and-play component.

Abstract

Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a training-free, plug-and-play framework that resolves GUI agent domain bias by autonomously acquiring domain-specific expertise from web tutorial videos through a retrieval-augmented automated annotation pipeline. GUIDE introduces two key innovations. First, a subtitle-driven Video-RAG pipeline unlocks video semantics through subtitle analysis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.