ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Xuhao Hu; Xi Zhang; Haiyang Xu; Kyle Qiao; Jingyi Yang; Xuanjing Huang; Jing Shao; Ming Yan; Jieping Ye

arXiv:2605.12481·cs.AI·May 13, 2026

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye

PDF

2 Repos 1 Models

TL;DR

ToolCUA is an end-to-end agent that learns optimal GUI-Tool path selection using a staged training paradigm, improving decision accuracy and execution efficiency in hybrid action spaces.

Contribution

It introduces a novel pipeline for synthesizing diverse GUI-Tool trajectories and a staged training approach combining supervised, reinforcement, and online learning.

Findings

01

Achieves 46.85% accuracy on OSWorld-MCP, a 66% improvement over baseline.

02

Improves GUI-only performance by 3.9%, showing effective GUI-Tool orchestration.

03

Demonstrates training in hybrid action spaces as a promising paradigm for digital agents.

Abstract

Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This difficulty stems from the scarcity of high-quality interleaved GUI-Tool trajectories, the cost and brittleness of collecting real tool trajectories, and the lack of trajectory-level supervision for GUI-Tool path selection. In this paper, we propose ToolCUA, an end-to-end agent designed to learn optimal GUI-Tool path selection through a staged training paradigm. We first introduce an Interleaved GUI-Tool Trajectory Scaling Pipeline that repurposes abundant static GUI trajectories and synthesizes a grounded tool library, enabling diverse GUI-Tool trajectories without manual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
mPLUG/ToolCUA-8B
model· 106 dl· ♡ 3
106 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.