Region-Aware Text-to-Image Generation via Hard Binding and Soft   Refinement

Zhennan Chen; Yajie Li; Haofan Wang; Zhibo Chen; Zhengkai Jiang; Jun; Li; Qian Wang; Jian Yang; Ying Tai

arXiv:2411.06558·cs.CV·November 19, 2024

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun, Li, Qian Wang, Jian Yang, Ying Tai

PDF

Open Access 1 Repo

TL;DR

RAG introduces a tuning-free, region-aware text-to-image generation method that enables precise layout control, flexible region modification, and improved attribute and relationship fidelity without additional training modules.

Contribution

It proposes a novel two-step regional generation approach with hard binding and soft refinement, allowing flexible, high-quality, and controllable image synthesis from text prompts.

Findings

01

Outperforms previous tuning-free methods in attribute binding and object relationships.

02

Enables user-modifiable regions without additional inpainting models.

03

Achieves superior spatial control and detail refinement in generated images.

Abstract

Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications. However, previous methods either introduce additional trainable modules, thus only applicable to specific models, or manipulate on score maps within cross-attention layers using attention masks, resulting in limited control strength when the number of regions increases. To handle these limitations, we present RAG, a Regional-Aware text-to-image Generation method conditioned on regional descriptions for precise layout composition. RAG decouple the multi-region generation into two sub-tasks, the construction of individual region (Regional Hard Binding) that ensures the regional prompt is properly executed, and the overall detail refinement (Regional Soft Refinement) over regions that dismiss the visual boundaries and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-pcalab/rag-diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Linear Warmup With Linear Decay · WordPiece · Dense Connections · Layer Normalization · Adam · Attention Dropout