CUA-Skill: Develop Skills for Computer Using Agent

Tianyi Chen; Yinheng Li; Michael Solodko; Sen Wang; Nan Jiang; Tingyuan Cui; Junheng Hao; Jongwoo Ko; Sara Abdali; Leon Xu; Suzhen Zheng; Hao Fan; Pashmina Cameron; Justin Wagle; Kazuhito Koishida

arXiv:2601.21123·cs.AI·February 4, 2026

CUA-Skill: Develop Skills for Computer Using Agent

Tianyi Chen, Yinheng Li, Michael Solodko, Sen Wang, Nan Jiang, Tingyuan Cui, Junheng Hao, Jongwoo Ko, Sara Abdali, Leon Xu, Suzhen Zheng, Hao Fan, Pashmina Cameron, Justin Wagle, Kazuhito Koishida

PDF

Open Access

TL;DR

CUA-Skill introduces a structured, reusable skill library for computer-using agents, significantly enhancing their success rates and robustness in automating Windows tasks, and providing a scalable infrastructure for future development.

Contribution

The paper presents CUA-Skill, a large-scale, structured skill library for computer-using agents, enabling more reliable and scalable automation of Windows applications.

Findings

01

Achieves 57.5% success rate on WindowsAgentArena, outperforming prior methods.

02

Substantially improves robustness and execution success in end-to-end benchmarks.

03

Provides a scalable infrastructure for future agent development.

Abstract

Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans interact with graphical user interfaces and how to leverage these skills. We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills coupled with parameterized execution and composition graphs. CUA-Skill is a large-scale library of carefully engineered skills spanning common Windows applications, serving as a practical infrastructure and tool substrate for scalable, reliable agent development. Built upon this skill base, we construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Topic Modeling · Multimodal Machine Learning Applications