UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

Zeyang Liu; Le Wang; Sanping Zhou; Yuxuan Wu; Xiaolong Sun; Gang Hua; Haoxiang Li

arXiv:2512.08897·cs.CV·December 10, 2025

UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

Zeyang Liu, Le Wang, Sanping Zhou, Yuxuan Wu, Xiaolong Sun, Gang Hua, Haoxiang Li

PDF

Open Access

TL;DR

UniLayDiff introduces a single, end-to-end trainable diffusion transformer model that unifies various content-aware layout generation tasks, improving quality and versatility in graphic design automation.

Contribution

It is the first model to unify diverse content-aware layout generation tasks using a Multi-Modal Diffusion Transformer with relation constraint fine-tuning.

Findings

01

Achieves state-of-the-art performance across multiple layout generation tasks.

02

Successfully unifies unconditional and conditional layout generation in one model.

03

Enhances layout quality through relation constraint integration.

Abstract

Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background image. The variety of real-world applications makes it highly challenging to develop a single model capable of unifying the diverse range of input-constrained generation sub-tasks, such as those conditioned by element types, sizes, or their relationships. Current methods either address only a subset of these tasks or necessitate separate model parameters for different conditions, failing to offer a truly unified solution. In this paper, we propose UniLayDiff: a Unified Diffusion Transformer, that for the first time, addresses various content-aware layout generation tasks with a single, end-to-end trainable model. Specifically, we treat layout constraints as a distinct modality and employ Multi-Modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis