Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers

Sida Huang; Siqi Huang; Ping Luo; Hongyuan Zhang

arXiv:2511.07934·cs.CV·November 12, 2025

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers

Sida Huang, Siqi Huang, Ping Luo, Hongyuan Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces Laytrol, a layout control network that preserves pretrained knowledge in diffusion models for improved layout-to-image generation, utilizing a new dataset and specialized initialization schemes.

Contribution

We propose Laytrol, a novel layout control network that maintains pretrained knowledge in diffusion models, along with the LaySyn dataset to reduce distribution shift.

Findings

01

Laytrol improves image quality and layout accuracy.

02

The method preserves pretrained knowledge effectively.

03

Experiments show superior performance over existing methods.

Abstract

With the development of diffusion models, enhancing spatial controllability in text-to-image generation has become a vital challenge. As a representative task for addressing this challenge, layout-to-image generation aims to generate images that are spatially consistent with the given layout condition. Existing layout-to-image methods typically introduce the layout condition by integrating adapter modules into the base generative model. However, the generated images often exhibit low visual quality and stylistic inconsistency with the base model, indicating a loss of pretrained knowledge. To alleviate this issue, we construct the Layout Synthesis (LaySyn) dataset, which leverages images synthesized by the base model itself to mitigate the distribution shift from the pretraining data. Moreover, we propose the Layout Control (Laytrol) Network, in which parameters are inherited from MM-DiT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications