Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Zehai He; Wenyi Hong; Zhen Yang; Ziyang Pan; Mingdao Liu; Xiaotao Gu; Jie Tang

arXiv:2603.26648·cs.SE·April 2, 2026

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Zehai He, Wenyi Hong, Zhen Yang, Ziyang Pan, Mingdao Liu, Xiaotao Gu, Jie Tang

PDF

1 Repo

TL;DR

Vision2Web is a comprehensive benchmark for evaluating visual website development capabilities of AI models, covering static, interactive, and full-stack tasks, with a new verification paradigm.

Contribution

It introduces a hierarchical benchmark with real-world tasks and a novel agent verification method for systematic evaluation of website development models.

Findings

01

State-of-the-art models show significant performance gaps.

02

Models struggle with full-stack website development.

03

Benchmark covers 193 tasks across 16 categories.

Abstract

Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zai-org/Vision2Web
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.