OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
Le Zhang, Yixiong Xiao, Xinjiang Lu, Jingjia Cao, Yusai Zhao, Jingbo Zhou, Lang An, Zikan Feng, Wanxiang Sha, Yu Shi, Congxi Xiao, Jian Xiong, Yankai Zhang, Hua Wu, Haifeng Wang

TL;DR
OmegaUse is a versatile GUI agent model that autonomously executes tasks across mobile and desktop platforms, utilizing high-quality data, innovative training, and a Mixture-of-Experts backbone, achieving state-of-the-art performance on multiple benchmarks.
Contribution
The paper introduces OmegaUse, a novel general-purpose GUI agent with a new data synthesis framework, a two-stage training paradigm, and a Mixture-of-Experts architecture for improved autonomous task execution.
Findings
Achieves 96.3% on ScreenSpot-V2 benchmark.
Reaches 79.1% step success on AndroidControl.
Attains 74.24% step success on ChiM-Nav.
Abstract
Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms, supporting computer-use and phone-use scenarios. Building an effective GUI agent model relies on two factors: (1) high-quality data and (2) effective training methods. To address these, we introduce a carefully engineered data-construction pipeline and a decoupled training paradigm. For data construction, we leverage rigorously curated open-source datasets and introduce a novel automated synthesis framework that integrates bottom-up autonomous exploration with top-down taxonomy-guided generation to create high-fidelity synthetic data. For training, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Multimodal Machine Learning Applications · Context-Aware Activity Recognition Systems
