SDXL: Improving Latent Diffusion Models for High-Resolution Image   Synthesis

Dustin Podell; Zion English; Kyle Lacey; Andreas Blattmann; Tim; Dockhorn; Jonas M\"uller; Joe Penna; Robin Rombach

arXiv:2307.01952·cs.CV·July 6, 2023·308 cites

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim, Dockhorn, Jonas M\"uller, Joe Penna, Robin Rombach

PDF

Open Access 5 Repos 10 Models 1 Datasets

TL;DR

SDXL is an advanced latent diffusion model for high-resolution text-to-image synthesis, featuring a larger architecture, novel conditioning schemes, and a refinement process, achieving state-of-the-art results and promoting open research.

Contribution

The paper introduces SDXL, a significantly improved latent diffusion model with a larger backbone, new conditioning methods, and a refinement technique, surpassing previous Stable Diffusion versions.

Findings

01

Drastically improved performance over previous Stable Diffusion models

02

Achieves results competitive with state-of-the-art image generators

03

Provides open access to code and models for research transparency

Abstract

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

yyyzzzzyyy/sd3_5_fine_sixcard
dataset· 7.5k dl
7.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsLatent Diffusion Model · Diffusion