EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Taofeng Xue, Chong Peng, Mianqiu Huang, Linsen Guo, Tiancheng Han, Haozhe Wang, Jianing Wang, Xiaocheng Zhang, Xin Yang, Dengchang Zhao, Jinrui Ding, Xiandi Ma, Yuchen Xie, Peng Pei, Xunliang Cai, Xipeng Qiu

TL;DR
EvoCUA introduces an evolutionary learning framework for native computer use agents, combining synthetic data generation, large-scale experience collection, and dynamic policy regulation to surpass existing benchmarks and improve agent capabilities.
Contribution
The paper presents EvoCUA, a novel evolutionary approach integrating synthetic task generation and scalable experience collection to enhance native computer use agents.
Findings
EvoCUA achieves a 56.7% success rate on OSWorld benchmark.
It outperforms previous open-source models like OpenCUA-72B.
The approach generalizes well across different foundation model scales.
Abstract
The development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work, we introduce EvoCUA, a native computer use agentic model. Unlike static imitation, EvoCUA integrates data generation and policy optimization into a self-sustaining evolutionary cycle. To mitigate data scarcity, we develop a verifiable synthesis engine that autonomously generates diverse tasks coupled with executable validators. To enable large-scale experience acquisition, we design a scalable infrastructure orchestrating tens of thousands of asynchronous sandbox rollouts. Building on these massive trajectories, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Malware Detection Techniques · Software System Performance and Reliability
