LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula,, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

TL;DR
LayoutGPT leverages large language models to generate detailed visual layouts from text descriptions, improving controllability and accuracy in image and scene synthesis across multiple domains.
Contribution
The paper introduces LayoutGPT, a novel approach that uses in-context learning with style sheet language to enhance LLMs' visual planning capabilities for diverse visual generation tasks.
Findings
Outperforms existing text-to-image systems by 20-40% in layout accuracy
Achieves human-level performance in numerical and spatial layout design
Effective in 2D image and 3D indoor scene synthesis
Abstract
Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs. LayoutGPT can generate plausible layouts in multiple domains, ranging from 2D images to 3D indoor scenes. LayoutGPT also shows superior performance in converting challenging language concepts like numerical and spatial relations to layout arrangements for faithful text-to-image generation. When combined with a downstream image generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques
