LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models
Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan

TL;DR
LayoutNUWA introduces a novel approach to graphic layout generation by framing it as a code generation task, leveraging large language models to improve semantic understanding and achieve state-of-the-art results.
Contribution
The paper presents LayoutNUWA, the first model to treat layout generation as code generation, utilizing a Code Instruct Tuning approach with three modules for interpretable and effective layout creation.
Findings
Achieves over 50% improvements on multiple datasets.
Outperforms existing methods in semantic layout understanding.
Provides a transparent, code-based layout generation process.
Abstract
Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception. Existing methods primarily treat layout generation as a numerical optimization task, focusing on quantitative aspects while overlooking the semantic information of layout, such as the relationship between each layout element. In this paper, we propose LayoutNUWA, the first model that treats layout generation as a code generation task to enhance semantic information and harness the hidden layout expertise of large language models~(LLMs). More concretely, we develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules: 1) the Code Initialization (CI) module quantifies the numerical conditions and initializes them as HTML code with strategically placed masks; 2) the Code Completion (CC) module employs the formatting knowledge of LLMs to fill…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Software Engineering Research · Interactive and Immersive Displays
