CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Haobo Hu; Xiangwu Guo; Zhiheng Chen; Difei Gao; Haotian Liu; Libiao Jin; Qi Mao

arXiv:2605.19484·cs.CV·May 20, 2026

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Haobo Hu, Xiangwu Guo, Zhiheng Chen, Difei Gao, Haotian Liu, Libiao Jin, Qi Mao

PDF

1 Repo

TL;DR

CutVerse introduces a comprehensive benchmark for evaluating autonomous GUI agents in professional media post-production tasks, highlighting current limitations and guiding future research in complex, multimodal workflows.

Contribution

The paper presents a new benchmark with expert demonstrations and a parser for structured evaluation of GUI agents in media editing, addressing a gap in autonomous agent capabilities.

Findings

01

Existing agents achieve only 36.0% success on complex tasks

02

Current models show promise in spatial grounding and multimodal alignment

03

Long-horizon reliability remains a significant challenge

Abstract

While GUI agents have made significant progress in web navigation and basic operating system tasks, their capabilities in professional creative workflows remain largely underexplored. To bridge this gap, we introduce Cutverse, a benchmark designed to systematically evaluate autonomous GUI agents in realistic media post-production environments. We curate expert demonstrations across 7 professional applications (e.g., Premiere Pro, Photoshop), covering 186 complex, long-horizon tasks grounded in authentic editing workflows, involving dense multimodal interfaces and tightly coupled interaction sequences. To support scalable evaluation, we develop a lightweight parser that transforms raw screen recordings and low-level interaction logs into structured, compositional GUI action trajectories with precise grounding. Extensive evaluations reveal that existing agents achieve only 36.0\% task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cuc-mipg/CutVerse
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.