Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Marianna Ohanyan; Hayk Manukyan; Zhangyang Wang; Shant; Navasardyan; Humphrey Shi

arXiv:2406.04032·cs.CV·June 7, 2024

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant, Navasardyan, Humphrey Shi

PDF

Open Access 1 Repo

TL;DR

Zero-Painter is a training-free framework that enables detailed, layout-controlled text-to-image synthesis by leveraging object masks and descriptions, achieving high fidelity and precise alignment without additional training.

Contribution

It introduces a novel training-free approach with new attention mechanisms for layout control in text-to-image synthesis, surpassing existing methods.

Findings

01

Outperforms state-of-the-art in detail preservation

02

Achieves high fidelity in object placement

03

Ensures precise alignment with textual prompts

Abstract

We present Zero-Painter, a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks, ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

picsart-ai-research/zero-painter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Handwritten Text Recognition Techniques