Zero-shot spatial layout conditioning for text-to-image diffusion models

Guillaume Couairon; Marl\`ene Careil; Matthieu Cord; St\'ephane; Lathuili\`ere; Jakob Verbeek

arXiv:2306.13754·cs.CV·June 27, 2023

Zero-shot spatial layout conditioning for text-to-image diffusion models

Guillaume Couairon, Marl\`ene Careil, Matthieu Cord, St\'ephane, Lathuili\`ere, Jakob Verbeek

PDF

Open Access

TL;DR

This paper introduces ZestGuide, a zero-shot segmentation guidance method for text-to-image diffusion models that enables precise spatial control using implicit segmentation maps without additional training.

Contribution

It presents a novel zero-shot approach that integrates segmentation guidance into pre-trained diffusion models, improving spatial accuracy without extra training.

Findings

01

Enhanced spatial alignment with input masks

02

Improved mIoU scores on COCO dataset

03

Maintained high image quality with better segmentation accuracy

Abstract

Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling and allow for an intuitive and powerful user interface to drive the image generation process. Expressing spatial constraints, e.g. to position specific objects in particular locations, is cumbersome using text; and current text-based image generation models are not able to accurately follow such instructions. In this paper we consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models, and does not require any additional training. It leverages implicit segmentation maps that can be extracted from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques

MethodsALIGN · Diffusion