GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents

Jian Mu; Chaoyun Zhang; Chiming Ni; Lu Wang; Bo Qiao; Kartik Mathur; Qianhui Wu; Yuhang Xie; Xiaojun Ma; Mengyu Zhou; Si Qin; Liqun Li; Yu Kang; Minghua Ma; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang

arXiv:2511.04307·cs.AI·November 11, 2025

GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents

Jian Mu, Chaoyun Zhang, Chiming Ni, Lu Wang, Bo Qiao, Kartik Mathur, Qianhui Wu, Yuhang Xie, Xiaojun Ma, Mengyu Zhou, Si Qin, Liqun Li, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

PDF

Open Access 1 Datasets

TL;DR

GUI-360$^ circ$ is a large-scale dataset and benchmark suite for developing and evaluating computer-using agents, addressing key gaps with automated data collection, multi-modal annotations, and comprehensive task evaluation.

Contribution

The paper introduces GUI-360$^ circ$, a novel large-scale dataset and benchmark with automated pipelines for multi-modal GUI tasks, enabling progress in computer-using agents.

Findings

01

State-of-the-art models show significant gaps in GUI grounding and action prediction.

02

Supervised fine-tuning and reinforcement learning improve model performance.

03

Current models still fall short of human-level reliability.

Abstract

We introduce GUI-360 $^{\circ}$ , a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs). CUAs present unique challenges and is constrained by three persistent gaps: a scarcity of real-world CUA tasks, the lack of automated collection-and-annotation pipelines for multi-modal trajectories, and the absence of a unified benchmark that jointly evaluates GUI grounding, screen parsing, and action prediction. GUI-360 $^{\circ}$ addresses these gaps with an LLM-augmented, largely automated pipeline for query sourcing, environment-template construction, task instantiation, batched execution, and LLM-driven quality filtering. The released corpus contains over 1.2M executed action steps across thousands of trajectories in popular Windows office applications, and includes full-resolution screenshots, accessibility metadata when available, instantiated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

vyokky/GUI-360
dataset· 58k dl
58k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics