Detector Guidance for Multi-Object Text-to-Image Generation

Luping Liu; Zijian Zhang; Yi Ren; Rongjie Huang; Xiang Yin; and Zhou Zhao

arXiv:2306.02236·cs.CV·June 6, 2023·2 cites

Detector Guidance for Multi-Object Text-to-Image Generation

Luping Liu, Zijian Zhang, Yi Ren, Rongjie Huang, Xiang Yin, and Zhou Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces Detector Guidance, a method that uses a latent object detection model to improve multi-object text-to-image generation by reducing object mixing and enhancing object separation.

Contribution

The paper proposes a novel Detector Guidance approach that integrates latent object detection into diffusion models to better handle multiple objects in generated images.

Findings

01

DG improves object separation in generated images.

02

Human evaluations show 8-22% reduction in object mixing.

03

DG outperforms baseline methods on COCO, CC, and MRO benchmarks.

Abstract

Diffusion models have demonstrated impressive performance in text-to-image generation. They utilize a text encoder and cross-attention blocks to infuse textual information into images at a pixel level. However, their capability to generate images with text containing multiple objects is still restricted. Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment. We find that mixing problems also occur on the image side and in the cross-attention blocks. The noisy images can cause different objects to appear similar, and the cross-attention blocks inject information at a pixel level, leading to leakage of global object understanding and resulting in object mixing. In this paper, we introduce Detector Guidance (DG), which integrates a latent object detection model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luping-liu/detector-guidance
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques

MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · SentencePiece · Adafactor · Residual Connection