GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents
Jian Mu, Chaoyun Zhang, Chiming Ni, Lu Wang, Bo Qiao, Kartik Mathur, Qianhui Wu, Yuhang Xie, Xiaojun Ma, Mengyu Zhou, Si Qin, Liqun Li, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

TL;DR
GUI-360$^ circ$ is a large-scale dataset and benchmark suite for developing and evaluating computer-using agents, addressing key gaps with automated data collection, multi-modal annotations, and comprehensive task evaluation.
Contribution
The paper introduces GUI-360$^ circ$, a novel large-scale dataset and benchmark with automated pipelines for multi-modal GUI tasks, enabling progress in computer-using agents.
Findings
State-of-the-art models show significant gaps in GUI grounding and action prediction.
Supervised fine-tuning and reinforcement learning improve model performance.
Current models still fall short of human-level reliability.
Abstract
We introduce GUI-360, a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs). CUAs present unique challenges and is constrained by three persistent gaps: a scarcity of real-world CUA tasks, the lack of automated collection-and-annotation pipelines for multi-modal trajectories, and the absence of a unified benchmark that jointly evaluates GUI grounding, screen parsing, and action prediction. GUI-360 addresses these gaps with an LLM-augmented, largely automated pipeline for query sourcing, environment-template construction, task instantiation, batched execution, and LLM-driven quality filtering. The released corpus contains over 1.2M executed action steps across thousands of trajectories in popular Windows office applications, and includes full-resolution screenshots, accessibility metadata when available, instantiated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
