CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design
Daeheon Jeong, Seoyeon Byun, Kihoon Son, Dae Hyun Kim, Juho Kim

TL;DR
CANVAS is a new benchmark that evaluates vision-language models' ability to perform tool-based user interface design tasks, including replication and modification, using real design software to assess their potential in aiding designers.
Contribution
This paper introduces CANVAS, the first comprehensive benchmark for assessing VLMs' performance in tool-based UI design tasks, with detailed tasks and analysis of model capabilities.
Findings
Leading models show more strategic tool use improving design quality
Models exhibit common error patterns that guide future improvements
Benchmark covers 598 tasks across 30 UI categories
Abstract
User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision language models (VLMs) with tool invocation suggest these models can operate design software to edit a UI design through iteration. Understanding and enhancing this capacity is important, as it highlights VLMs' potential to collaborate with designers within conventional software. However, as no existing benchmark evaluates tool-based design performance, the capacity remains unknown. To address this, we introduce CANVAS, a benchmark for VLMs on tool-based user interface design. Our benchmark contains 598 tool-based design tasks paired with ground-truth references sampled from 3.3K mobile UI designs across 30 function-based categories (e.g., onboarding, messaging). In each task, a VLM updates the design step-by-step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Usability and User Interface Design · Innovative Human-Technology Interaction
