LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis

Peiang Zhao; Han Li; Ruiyang Jin; S. Kevin Zhou

arXiv:2311.12342·cs.CV·March 27, 2024·2 cites

LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis

Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou

PDF

Open Access

TL;DR

LoCo is a training-free method that improves layout-to-image synthesis by using localized attention and padding token constraints, enabling precise control over object placement and appearance in generated images.

Contribution

Introducing LoCo, a training-free framework with novel constraints that enhance spatial control and semantic consistency in layout-to-image synthesis.

Findings

01

Outperforms existing methods in quality and accuracy

02

Enhances spatial control in image generation

03

Improves semantic consistency with layout instructions

Abstract

Recent text-to-image diffusion models have reached an unprecedented level in generating high-quality images. However, their exclusive reliance on textual prompts often falls short in precise control of image compositions. In this paper, we propose LoCo, a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions. Specifically, we introduce a Localized Attention Constraint (LAC), leveraging semantic affinity between pixels in self-attention maps to create precise representations of desired objects and effectively ensure the accurate placement of objects in designated regions. We further propose a Padding Token Constraint (PTC) to leverage the semantic information embedded in previously neglected padding tokens, improving the consistency between object appearance and layout instructions. LoCo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsDiffusion