UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Zhengxi Lu; Fei Tang; Guangyi Liu; Kaitao Song; Xu Tan; Jin Ma; Wenqi Zhang; Weiming Lu; Jun Xiao; Yueting Zhuang; Yongliang Shen

arXiv:2604.13822·cs.LG·April 16, 2026

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Zhengxi Lu, Fei Tang, Guangyi Liu, Kaitao Song, Xu Tan, Jin Ma, Wenqi Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

PDF

1 Repo

TL;DR

UI-Copilot introduces a collaborative GUI agent framework with tool-optimized policy training, significantly improving long-horizon task performance and generalization in complex user interface interactions.

Contribution

The paper proposes UI-Copilot, a novel framework combining a GUI agent with a copilot for memory and computation, and introduces TIPO for effective tool invocation learning.

Findings

01

UI-Copilot-7B achieves state-of-the-art results on MemGUI-Bench.

02

UI-Copilot-7B outperforms other 7B-scale GUI agents like GUI-Owl-7B.

03

UI-Copilot-7B improves AndroidWorld performance by 17.1%.

Abstract

MLLM-based GUI agents have demonstrated strong capabilities in complex user interface interaction tasks. However, long-horizon scenarios remain challenging, as these agents are burdened with tasks beyond their intrinsic capabilities, suffering from memory degradation, progress confusion, and math hallucination. To address these challenges, we present UI-Copilot, a collaborative framework where the GUI agent focuses on task execution while a lightweight copilot provides on-demand assistance for memory retrieval and numerical computation. We introduce memory decoupling to separate persistent observations from transient execution context, and train the policy agent to selectively invoke the copilot as Retriever or Calculator based on task demands. To enable effective tool invocation learning, we propose Tool-Integrated Policy Optimization (TIPO), which separately optimizes tool selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-real/UI-Copilot
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.