PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Junxian Li; Kai Liu; Leyang Chen; Weida Wang; Zhixin Wang; Jiaqi Xu; Fan Li; Renjing Pei; Linghe Kong; Yulun Zhang

arXiv:2602.06663·cs.CV·April 21, 2026

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Junxian Li, Kai Liu, Leyang Chen, Weida Wang, Zhixin Wang, Jiaqi Xu, Fan Li, Renjing Pei, Linghe Kong, Yulun Zhang

PDF

1 Repo

TL;DR

PlanViz introduces a benchmark for evaluating image generation and editing capabilities of models in computer-use planning tasks like route planning and UI display, emphasizing spatial reasoning and procedural understanding.

Contribution

The paper presents a new benchmark, PlanViz, with a task-adaptive scoring method, to assess UMMs' performance on computer-use planning tasks involving image generation and editing.

Findings

01

Experiments reveal current limitations of UMMs in planning tasks.

02

The benchmark highlights opportunities for improving spatial reasoning in models.

03

PlanScore effectively measures correctness, visual quality, and efficiency of generated images.

Abstract

Unified multimodal models (UMMs) have shown impressive capabilities in generating natural images and supporting multimodal reasoning. However, their potential in supporting computer-use planning tasks, which are closely related to our lives, remain underexplored. Image generation and editing in computer-use tasks require capabilities like spatial reasoning and procedural understanding, and it is still unknown whether UMMs have these capabilities to finish these tasks or not. Therefore, we propose PlanViz, a new benchmark designed to evaluate image generation and editing for computer-use tasks. To achieve the goal of our evaluation, we focus on sub-tasks which frequently involve in daily life and require planning. Specifically, three representative sub-tasks are designed: route planning, work diagramming, and web&UI displaying. We address challenges in data quality ensuring by curating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijunxian111/PlanViz
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.