eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert   Denoisers

Yogesh Balaji; Seungjun Nah; Xun Huang; Arash Vahdat; Jiaming Song,; Qinsheng Zhang; Karsten Kreis; Miika Aittala; Timo Aila; Samuli Laine; Bryan; Catanzaro; Tero Karras; Ming-Yu Liu

arXiv:2211.01324·cs.CV·March 15, 2023·223 cites

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song,, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan, Catanzaro, Tero Karras, Ming-Yu Liu

PDF

Open Access 2 Repos 10 Models

TL;DR

eDiff-I introduces an ensemble of specialized diffusion models for different stages of text-to-image synthesis, improving text alignment and style transfer while maintaining high visual quality and efficiency.

Contribution

The paper proposes a novel ensemble approach with stage-specific diffusion models, enhancing text-image alignment and enabling style transfer and interactive editing.

Findings

01

Outperforms previous models on standard benchmarks.

02

Enables style transfer via CLIP image embeddings.

03

Supports interactive 'paint-with-words' image editing.

Abstract

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Inverse Square Root Schedule · Softmax · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia?