ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Chenyu Yang; Shiqian Su; Shi Liu; Xuan Dong; Yue Yu; Weijie Su; Xuehui Wang; Zhaoyang Liu; Jinguo Zhu; Hao Li; Wenhai Wang; Yu Qiao; Xizhou Zhu; Jifeng Dai

arXiv:2505.23762·cs.AI·May 30, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, Wenhai Wang, Yu Qiao, Xizhou Zhu, Jifeng Dai

PDF

Open Access 1 Repo 2 Models

TL;DR

ZeroGUI introduces an online, scalable framework that leverages Vision-Language Models to automate GUI agent training without human annotations, improving adaptability and performance in dynamic environments.

Contribution

It presents a novel online learning approach combining VLM-based automatic task generation and reward estimation for GUI agents, reducing reliance on manual labels.

Findings

01

Significantly improves GUI agent performance in OSWorld and AndroidLab.

02

Enables continuous learning from environment interactions without human supervision.

03

Demonstrates effectiveness across multiple advanced GUI agents.

Abstract

The rapid advancement of large Vision-Language Models (VLMs) has propelled the development of pure-vision-based GUI Agents, capable of perceiving and operating Graphical User Interfaces (GUI) to autonomously fulfill user instructions. However, existing approaches usually adopt an offline learning framework, which faces two core limitations: (1) heavy reliance on high-quality manual annotations for element grounding and action supervision, and (2) limited adaptability to dynamic and interactive environments. To address these limitations, we propose ZeroGUI, a scalable, online learning framework for automating GUI Agent training at Zero human cost. Specifically, ZeroGUI integrates (i) VLM-based automatic task generation to produce diverse training goals from the current environment state, (ii) VLM-based automatic reward estimation to assess task success without hand-crafted evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opengvlab/zerogui
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI

MethodsADaptive gradient method with the OPTimal convergence rate