Training-free Regional Prompting for Diffusion Transformers
Anthony Chen, Jianjin Xu, Wenzhao Zheng, Gaole Dai, Yida Wang, Renrui, Zhang, Haofan Wang, Shanghang Zhang

TL;DR
This paper introduces a training-free regional prompting method for Diffusion Transformers like FLUX, enhancing their ability to generate complex, multi-object images from detailed prompts by manipulating attention mechanisms.
Contribution
It presents the first regional prompting technique for Diffusion Transformers, enabling fine-grained text-to-image generation without additional training.
Findings
Enables detailed compositional image generation from complex prompts.
Operates in a training-free manner using attention manipulation.
Applicable to recent Diffusion Transformer architectures like FLUX.
Abstract
Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models cannot perfectly handle long and complex text prompts, especially when the text prompts contain various objects with numerous attributes and interrelated spatial relationships. While many regional prompting methods have been proposed for UNet-based models (SD1.5, SDXL), but there are still no implementations based on the recent Diffusion Transformer (DiT) architecture, such as SD3 and FLUX.1.In this report, we propose and implement regional prompting for FLUX.1 based on attention manipulation, which enables DiT with fined-grained compositional text-to-image generation capability in a training-free manner. Code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Magneto-Optical Properties and Applications
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · SentencePiece · Position-Wise Feed-Forward Layer · Inverse Square Root Schedule · Adam · Attention Dropout · Multi-Head Attention
