IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

Hongcheng Guo; Wei Zhang; Junhao Chen; Yaonan Gu; Jian Yang; Junjia Du; Shaosheng Cao; Binyuan Hui; Tianyu Liu; Jianxin Ma; Chang Zhou; Zhoujun Li

arXiv:2409.18980·cs.CL·December 4, 2025

IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

Hongcheng Guo, Wei Zhang, Junhao Chen, Yaonan Gu, Jian Yang, Junjia Du, Shaosheng Cao, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li

PDF

Open Access

TL;DR

This paper introduces IW-Bench, a comprehensive benchmark for evaluating large multimodal models' ability to convert images into web pages, focusing on element accuracy and layout fidelity, with new prompting techniques.

Contribution

It proposes novel metrics for element and layout accuracy, curates a large benchmark dataset, and introduces a five-hop Chain-of-Thought prompting method for improved performance.

Findings

01

Existing models show room for improvement in element completeness.

02

Layout accuracy remains a challenge for current models.

03

The benchmark provides a new standard for evaluating image-to-web conversion.

Abstract

Recently advancements in large multimodal models have led to significant strides in image comprehension capabilities. Despite these advancements, there is a lack of the robust benchmark specifically for assessing the Image-to-Web conversion proficiency of these large models. Primarily, it is essential to ensure the integrity of the web elements generated. These elements comprise visible and invisible categories. Previous evaluation methods (e.g.,BLEU) are notably susceptible to significant alterations due to the presence of invisible elements in Web. Furthermore, it is crucial to measure the layout information of web pages, referring to the positional relationships between elements, which is overlooked by previous work. To address challenges, we have curated and aligned a benchmark of images and corresponding web codes (IW-BENCH). Specifically, we propose the Element Accuracy, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology

MethodsSelf-Organizing Map