eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song,, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan, Catanzaro, Tero Karras, Ming-Yu Liu

TL;DR
eDiff-I introduces an ensemble of specialized diffusion models for different stages of text-to-image synthesis, improving text alignment and style transfer while maintaining high visual quality and efficiency.
Contribution
The paper proposes a novel ensemble approach with stage-specific diffusion models, enhancing text-image alignment and enabling style transfer and interactive editing.
Findings
Outperforms previous models on standard benchmarks.
Enables style transfer via CLIP image embeddings.
Supports interactive 'paint-with-words' image editing.
Abstract
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗stabilityai/stable-diffusion-xl-base-1.0model· 2.0M dl· ♡ 75792.0M dl♡ 7579
- 🤗stabilityai/stable-diffusion-xl-refiner-1.0model· 259k dl· ♡ 2030259k dl♡ 2030
- 🤗apple/coreml-stable-diffusion-xl-base-iosmodel· ♡ 39♡ 39
- 🤗apple/coreml-stable-diffusion-xl-basemodel· 104 dl· ♡ 70104 dl♡ 70
- 🤗frankjoshua/stable-diffusion-xl-base-1.0model· 52 dl· ♡ 152 dl♡ 1
- 🤗frankjoshua/stable-diffusion-xl-refiner-1.0model· 14 dl14 dl
- 🤗cgburgos/sdxl-1-0-basemodel· 52 dl· ♡ 752 dl♡ 7
- 🤗timothymhowe/stable-diffusion-xl-base-1.0model· 38 dl· ♡ 238 dl♡ 2
- 🤗Andyrasika/dreamviewer-sdxl-1.0model· 31 dl· ♡ 631 dl♡ 6
- 🤗remg1997/xl-1.0model· 19 dl· ♡ 119 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Inverse Square Root Schedule · Softmax · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia?
