LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer
Yu Li, Yifan Chen, Gongye Liu, Fei Yin, Qingyan Bai, Jie Wu, Hongfa, Wang, Ruihang Chu, Yujiu Yang

TL;DR
LayoutDiT is a novel diffusion transformer framework that effectively balances content and graphic features to generate high-quality, visually appealing layouts, addressing limitations of previous methods in spatial accuracy and aesthetics.
Contribution
We propose LayoutDiT, which introduces an adaptive balancing factor and a saliency bounding box to improve content-graphic harmony in layout generation using diffusion transformers.
Findings
Outperforms existing methods in constrained and unconstrained settings
Generates layouts with fewer overlaps and better spatial alignment
Achieves higher aesthetic quality and content coherence
Abstract
Layout generation is a foundation task of graphic design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlapping, small-sized, or spatial misalignment. We found that these methods overlook the crucial balance between learning content-aware and graphic-aware features. This oversight results in their limited ability to model the graphic structure of layouts and generate reasonable layout arrangements. To address these challenges, we introduce LayoutDiT, an effective framework that balances content and graphic features to generate high-quality, visually appealing layouts. Specifically, we first design an adaptive factor that optimizes the model's awareness of the layout generation space, balancing the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Semantic Web and Ontologies · Image Retrieval and Classification Techniques
MethodsFocus · Diffusion
