PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with   LLM

Tao Yang; Yingmin Luo; Zhongang Qi; Yang Wu; Ying Shan; Chang Wen Chen

arXiv:2406.02884·cs.CV·November 27, 2024·1 cites

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

PDF

Open Access 1 Repo

TL;DR

PosterLLaVa introduces a unified multi-modal layout generator leveraging large language models, achieving state-of-the-art results and enabling automated, flexible graphic design tasks including user-constrained poster creation.

Contribution

It presents a novel data-driven framework using structured text and visual instruction tuning for multi-modal layout generation, along with new challenging datasets and an automated poster system.

Findings

01

Achieved state-of-the-art performance on public benchmarks.

02

Developed two new datasets for complex design tasks.

03

Created an automated SVG poster generation system.

Abstract

Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

posterllava/posterllava
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Web Data Mining and Analysis · Multimedia Communication and Technology