SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim, Dockhorn, Jonas M\"uller, Joe Penna, Robin Rombach

TL;DR
SDXL is an advanced latent diffusion model for high-resolution text-to-image synthesis, featuring a larger architecture, novel conditioning schemes, and a refinement process, achieving state-of-the-art results and promoting open research.
Contribution
The paper introduces SDXL, a significantly improved latent diffusion model with a larger backbone, new conditioning methods, and a refinement technique, surpassing previous Stable Diffusion versions.
Findings
Drastically improved performance over previous Stable Diffusion models
Achieves results competitive with state-of-the-art image generators
Provides open access to code and models for research transparency
Abstract
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗stabilityai/stable-diffusion-xl-base-1.0model· 2.0M dl· ♡ 75792.0M dl♡ 7579
- 🤗stabilityai/stable-diffusion-xl-refiner-1.0model· 259k dl· ♡ 2030259k dl♡ 2030
- 🤗apple/coreml-stable-diffusion-xl-base-iosmodel· ♡ 39♡ 39
- 🤗stabilityai/japanese-stable-diffusion-xlmodel· 46 dl· ♡ 10346 dl♡ 103
- 🤗stabilityai/stable-diffusion-xl-base-0.9model· 198 dl· ♡ 1416198 dl♡ 1416
- 🤗stabilityai/stable-diffusion-xl-refiner-0.9model· 74 dl· ♡ 33474 dl♡ 334
- 🤗snowkidy/stable-diffusion-xl-base-0.9model· 104 dl· ♡ 5104 dl♡ 5
- 🤗FFusion/FFusionXL-09-SDXLmodel· 49 dl· ♡ 449 dl♡ 4
- 🤗ZachNagengast/coreml-stable-diffusion-xl-v0-9-basemodel· 9 dl9 dl
- 🤗ZachNagengast/coreml-stable-diffusion-xl-v0-9-base-palletizedmodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsLatent Diffusion Model · Diffusion
