MLLM-Based UI2Code Automation Guided by UI Layout Information
Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao

TL;DR
This paper introduces LayoutCoder, an MLLM-based framework that improves UI2Code automation by effectively understanding UI layouts and generating accurate code, outperforming existing methods on real-world datasets.
Contribution
The paper presents LayoutCoder, a novel multimodal large language model framework that leverages UI layout information to enhance code generation from webpage images, addressing generalization issues.
Findings
LayoutCoder outperforms state-of-the-art methods in BLEU and CLIP scores.
The framework effectively captures UI layout relations and generates layout-preserving code.
Extensive evaluations demonstrate superior performance on real-world datasets.
Abstract
Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
