OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu

TL;DR
OS-Genesis introduces a reverse task synthesis pipeline for GUI data generation, enabling more diverse and high-quality trajectories for training GUI agents, thereby improving their performance on challenging benchmarks.
Contribution
It proposes a novel reverse trajectory synthesis method that enhances data diversity and quality for GUI agent training, addressing limitations of existing data collection approaches.
Findings
Significantly improves GUI agent performance on benchmarks.
Generates more diverse and higher-quality trajectory data.
Outperforms existing synthetic data methods.
Abstract
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗OS-Copilot/OS-Genesis-7B-ACmodel· 7 dl· ♡ 77 dl♡ 7
- 🤗OS-Copilot/OS-Genesis-8B-ACmodel· 5 dl· ♡ 45 dl♡ 4
- 🤗OS-Copilot/OS-Genesis-4B-ACmodel· 8 dl· ♡ 78 dl♡ 7
- 🤗OS-Copilot/OS-Genesis-4B-AWmodel· 3 dl3 dl
- 🤗OS-Copilot/OS-Genesis-8B-AWmodel· 2 dl2 dl
- 🤗OS-Copilot/OS-Genesis-7B-AWmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗OS-Copilot/OS-Genesis-4B-WAmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗OS-Copilot/OS-Genesis-8B-WAmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗OS-Copilot/OS-Genesis-7B-WAmodel· 6 dl6 dl
Videos
Taxonomy
TopicsRobotics and Automated Systems · Social Robot Interaction and HRI · Context-Aware Activity Recognition Systems
