CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Hui Zhang, Dexiang Hong, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang

TL;DR
CreatiLayout introduces a multimodal diffusion transformer with a siamese architecture and a large-scale dataset for precise, controllable, and creative layout-to-image generation, leveraging layout planning and optimization.
Contribution
The paper presents SiamLayout, a novel multimodal diffusion transformer with a siamese structure for layout guidance, and introduces LayoutSAM dataset and Layout Designer for enhanced layout-to-image generation.
Findings
Effective integration of layout guidance into MM-DiT.
Large-scale LayoutSAM dataset with 2.7 million image-text pairs.
Improved quality and controllability in layout-to-image generation.
Abstract
Diffusion models have been recognized for their ability to generate images that are not only visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) generation has been proposed to leverage region-specific positions and descriptions to enable more precise and controllable generation. However, previous methods primarily focus on UNet-based models (\eg SD1.5 and SDXL), and limited effort has explored Multimodal Diffusion Transformers (MM-DiTs), which have demonstrated powerful image generation capabilities. Enabling MM-DiT for layout-to-image generation seems straightforward but is challenging due to the complexity of how layout is introduced, integrated, and balanced among multiple modalities. To this end, we explore various network variants to efficiently incorporate layout guidance into MM-DiT, and ultimately present SiamLayout. To inherit the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Simulation and Modeling Applications
MethodsSparse Evolutionary Training · Diffusion · Focus
