LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with   Diffusion Transformer

Yu Li; Yifan Chen; Gongye Liu; Fei Yin; Qingyan Bai; Jie Wu; Hongfa; Wang; Ruihang Chu; Yujiu Yang

arXiv:2407.15233·cs.CV·November 26, 2024

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

Yu Li, Yifan Chen, Gongye Liu, Fei Yin, Qingyan Bai, Jie Wu, Hongfa, Wang, Ruihang Chu, Yujiu Yang

PDF

Open Access

TL;DR

LayoutDiT is a novel diffusion transformer framework that effectively balances content and graphic features to generate high-quality, visually appealing layouts, addressing limitations of previous methods in spatial accuracy and aesthetics.

Contribution

We propose LayoutDiT, which introduces an adaptive balancing factor and a saliency bounding box to improve content-graphic harmony in layout generation using diffusion transformers.

Findings

01

Outperforms existing methods in constrained and unconstrained settings

02

Generates layouts with fewer overlaps and better spatial alignment

03

Achieves higher aesthetic quality and content coherence

Abstract

Layout generation is a foundation task of graphic design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlapping, small-sized, or spatial misalignment. We found that these methods overlook the crucial balance between learning content-aware and graphic-aware features. This oversight results in their limited ability to model the graphic structure of layouts and generate reasonable layout arrangements. To address these challenges, we introduce LayoutDiT, an effective framework that balances content and graphic features to generate high-quality, visually appealing layouts. Specifically, we first design an adaptive factor that optimizes the model's awareness of the layout generation space, balancing the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Semantic Web and Ontologies · Image Retrieval and Classification Techniques

MethodsFocus · Diffusion