Design2Code: Benchmarking Multimodal Code Generation for Automated   Front-End Engineering

Chenglei Si; Yanzhe Zhang; Ryan Li; Zhengyuan Yang; Ruibo Liu; Diyi; Yang

arXiv:2403.03163·cs.CL·February 11, 2025·3 cites

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi, Yang

PDF

Open Access 1 Models 5 Datasets 1 Video

TL;DR

This paper introduces Design2Code, a benchmark for evaluating multimodal large language models' ability to convert webpage screenshots into accurate code, highlighting current models' limitations in visual element recall and layout accuracy.

Contribution

It presents the first real-world benchmark for multimodal code generation from visual designs, including curated datasets, evaluation metrics, and comprehensive model testing.

Findings

01

Models struggle with visual element recall.

02

Models often generate incorrect layouts.

03

Benchmark reveals significant room for improvement.

Abstract

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language models (MLLMs) directly convert visual designs into code implementations. In this work, we construct Design2Code - the first real-world benchmark for this task. Specifically, we manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics to assess how well current multimodal LLMs can generate the code implementations that directly render into the given reference webpages, given the screenshots as input. We also complement automatic metrics with comprehensive human evaluations to validate the performance ranking. To rigorously benchmark MLLMs, we test various multimodal prompting methods on frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SALT-NLP/Design2Code-18B-v0
model· ♡ 42
♡ 42

Datasets

Videos

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering· underline

Taxonomy

TopicsManufacturing Process and Optimization · BIM and Construction Integration

MethodsSparse Evolutionary Training