Layout-to-Image Generation with Localized Descriptions using ControlNet   with Cross-Attention Control

Denis Lukovnikov; Asja Fischer

arXiv:2402.13404·cs.CV·February 22, 2024·1 cites

Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control

Denis Lukovnikov, Asja Fischer

PDF

Open Access

TL;DR

This paper enhances ControlNet's ability to generate images from localized textual descriptions by modifying cross-attention scores during inference, enabling fine-grained control without additional training.

Contribution

It introduces a training-free cross-attention control method that improves layout-to-image generation with localized descriptions in ControlNet.

Findings

01

Improved control over image regions using localized descriptions

02

Reduction of concept bleeding and image degradation

03

Effective in challenging layout scenarios

Abstract

While text-to-image diffusion models can generate highquality images from textual descriptions, they generally lack fine-grained control over the visual composition of the generated images. Some recent works tackle this problem by training the model to condition the generation process on additional input describing the desired image layout. Arguably the most popular among such methods, ControlNet, enables a high degree of control over the generated image using various types of conditioning inputs (e.g. segmentation maps). However, it still lacks the ability to take into account localized textual descriptions that indicate which image region is described by which phrase in the prompt. In this work, we show the limitations of ControlNet for the layout-to-image task and enable it to use localized descriptions using a training-free approach that modifies the crossattention scores during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques

MethodsDiffusion