Unlocking the conversion of Web Screenshots into HTML Code with the   WebSight Dataset

Hugo Lauren\c{c}on; L\'eo Tronchon; Victor Sanh

arXiv:2403.09029·cs.HC·March 15, 2024·2 cites

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Hugo Lauren\c{c}on, L\'eo Tronchon, Victor Sanh

PDF

Open Access 2 Models 5 Datasets

TL;DR

This paper introduces WebSight, a large synthetic dataset of 2 million webpage screenshot-HTML pairs, enabling fine-tuning of vision-language models to convert screenshots into HTML code, advancing no-code web development.

Contribution

The paper presents WebSight, the first high-quality, large-scale dataset for training models to convert web screenshots into HTML, and demonstrates fine-tuning a VLM for this task.

Findings

01

VLMs can be effectively fine-tuned on WebSight to generate HTML from screenshots.

02

WebSight accelerates research in screenshot-to-HTML conversion.

03

Open-sourcing WebSight supports further advancements in no-code web development.

Abstract

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Mobile and Web Applications