LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis
Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou

TL;DR
LoCo is a training-free method that improves layout-to-image synthesis by using localized attention and padding token constraints, enabling precise control over object placement and appearance in generated images.
Contribution
Introducing LoCo, a training-free framework with novel constraints that enhance spatial control and semantic consistency in layout-to-image synthesis.
Findings
Outperforms existing methods in quality and accuracy
Enhances spatial control in image generation
Improves semantic consistency with layout instructions
Abstract
Recent text-to-image diffusion models have reached an unprecedented level in generating high-quality images. However, their exclusive reliance on textual prompts often falls short in precise control of image compositions. In this paper, we propose LoCo, a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions. Specifically, we introduce a Localized Attention Constraint (LAC), leveraging semantic affinity between pixels in self-attention maps to create precise representations of desired objects and effectively ensure the accurate placement of objects in designated regions. We further propose a Padding Token Constraint (PTC) to leverage the semantic information embedded in previously neglected padding tokens, improving the consistency between object appearance and layout instructions. LoCo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction
MethodsDiffusion
