EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Taofeng Xue; Chong Peng; Mianqiu Huang; Linsen Guo; Tiancheng Han; Haozhe Wang; Jianing Wang; Xiaocheng Zhang; Xin Yang; Dengchang Zhao; Jinrui Ding; Xiandi Ma; Yuchen Xie; Peng Pei; Xunliang Cai; Xipeng Qiu

arXiv:2601.15876·cs.AI·January 26, 2026

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Taofeng Xue, Chong Peng, Mianqiu Huang, Linsen Guo, Tiancheng Han, Haozhe Wang, Jianing Wang, Xiaocheng Zhang, Xin Yang, Dengchang Zhao, Jinrui Ding, Xiandi Ma, Yuchen Xie, Peng Pei, Xunliang Cai, Xipeng Qiu

PDF

Open Access 2 Models

TL;DR

EvoCUA introduces an evolutionary learning framework for native computer use agents, combining synthetic data generation, large-scale experience collection, and dynamic policy regulation to surpass existing benchmarks and improve agent capabilities.

Contribution

The paper presents EvoCUA, a novel evolutionary approach integrating synthetic task generation and scalable experience collection to enhance native computer use agents.

Findings

01

EvoCUA achieves a 56.7% success rate on OSWorld benchmark.

02

It outperforms previous open-source models like OpenCUA-72B.

03

The approach generalizes well across different foundation model scales.

Abstract

The development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work, we introduce EvoCUA, a native computer use agentic model. Unlike static imitation, EvoCUA integrates data generation and policy optimization into a self-sustaining evolutionary cycle. To mitigate data scarcity, we develop a verifiable synthesis engine that autonomously generates diverse tasks coupled with executable validators. To enable large-scale experience acquisition, we design a scalable infrastructure orchestrating tens of thousands of asynchronous sandbox rollouts. Building on these massive trajectories, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Malware Detection Techniques · Software System Performance and Reliability