UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action
Yuhao Yang, Zhen Yang, Zi-Yi Dou, Anh Nguyen, Keen You, Omar Attia, Andrew Szot, Michael Feng, Ram Ramrakhya, Alexander Toshev, Chao Huang, Yinfei Yang, Zhe Gan

TL;DR
UltraCUA introduces a hybrid action foundation model that combines primitive GUI operations with high-level tool execution, significantly improving the robustness and efficiency of computer-use agents across diverse tasks.
Contribution
It presents a novel hybrid action framework, a scalable data pipeline, and a two-stage training process, enabling agents to seamlessly integrate GUI primitives with tool-based actions.
Findings
22% performance improvement on OSWorld
11% faster execution compared to existing methods
21.7% success rate on WindowsAgentArena
Abstract
Computer-use agents face a fundamental limitation. They rely exclusively on primitive GUI actions (click, type, scroll), creating brittle execution chains prone to cascading failures. While API-driven agents harness rich capabilities through structured interfaces and tools, computer-use agents remain constrained to low-level visual interactions. We present UltraCUA, a foundation model that transcends this limitation through hybrid action-seamlessly unifying primitive GUI operations with high-level tool execution. Our innovation rests on four critical advances. First, an automated pipeline extracts and scales tool capabilities from software documentation and code repositories. Second, a synthetic data engine produces 17,000+ verifiable tasks capturing real-world computer-use complexity. Third, comprehensive hybrid action trajectory collection incorporates both GUI primitives and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Software Engineering Methodologies · Reinforcement Learning in Robotics
