GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Saelyne Yang; Jaesang Yu; Yi-Hao Peng; Kevin Qinghong Lin; Jae Won Cho; Yale Song; Juho Kim

arXiv:2603.25864·cs.CV·March 30, 2026

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim

PDF

1 Repo

TL;DR

GUIDE introduces a comprehensive benchmark dataset for evaluating AI models' ability to understand user behavior, infer intent, and assist in open-ended GUI tasks across various software, highlighting current model limitations.

Contribution

This paper presents GUIDE, a novel benchmark with annotated GUI user demonstrations, to evaluate AI's understanding of user intent and ability to assist in complex, open-ended tasks.

Findings

01

All models struggled with behavior state and help prediction accuracy.

02

Providing user context significantly improved help prediction performance.

03

Structured user understanding is critical for effective GUI assistance.

Abstract

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why. We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software. GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://guide-bench.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.