Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, and Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang and, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, and Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

TL;DR
Web2Code introduces a large-scale webpage-to-code dataset and evaluation framework to improve multimodal large language models' ability to understand webpage images and generate accurate HTML code, addressing a current gap in multimodal AI capabilities.
Contribution
The paper presents a new dataset and evaluation framework specifically designed for webpage understanding and code generation tasks in multimodal LLMs, leveraging pretrained LLMs for dataset enhancement.
Findings
The dataset improves MLLMs' webpage understanding and code generation.
Models show limited performance on webpage-to-code tasks without specialized training.
The dataset benefits general visual domain tasks as well.
Abstract
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose , a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Wikis in Education and Collaboration
