Text-Guided Synthesis of Artistic Images with Retrieval-Augmented   Diffusion Models

Robin Rombach; Andreas Blattmann; Bj\"orn Ommer

arXiv:2207.13038·cs.CV·July 27, 2022·34 cites

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

Robin Rombach, Andreas Blattmann, Bj\"orn Ommer

PDF

Open Access 1 Repo

TL;DR

This paper introduces retrieval-augmented diffusion models (RDMs) for artistic image synthesis, which improve style control by retrieving style-specific images during training and inference, outperforming traditional prompt-engineering methods.

Contribution

The authors propose a novel retrieval-augmented diffusion approach that allows flexible style specification in image synthesis, surpassing text prompt methods in effectiveness.

Findings

01

Retrieval-augmented models outperform prompt-engineering in style control

02

Specialized databases improve style specificity during inference

03

Open-source code and models are provided for reproducibility

Abstract

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called ``prompt-engineering'' has become established, in which carefully selected and composed sentences are used to achieve a certain visual style in the synthesized image. In this note, we present an alternative approach based on retrieval-augmented diffusion models (RDMs). In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples. During inference (sampling), we replace the retrieval database with a more specialized database that contains,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

compvis/latent-diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Contrastive Language-Image Pre-training