Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi, Gao, Tim Salimans

TL;DR
This paper demonstrates that pixel-space diffusion models can achieve state-of-the-art image synthesis quality at high resolutions, rivaling latent models in efficiency and performance through a simple scaling recipe.
Contribution
The authors introduce SiD2, a pixel-space diffusion model that attains competitive high-resolution image synthesis results with a straightforward scaling approach and architectural simplifications.
Findings
Achieved 1.5 FID on ImageNet512.
Set new SOTA on ImageNet128, ImageNet256, Kinetics600.
Pixel-space models can be as efficient and high-quality as latent models.
Abstract
Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent models are perceived to be more efficient and to produce higher image quality at high resolution. Here we challenge these notions, and show that pixel-space models can be very competitive to latent models both in quality and efficiency, achieving 1.5 FID on ImageNet512 and new SOTA results on ImageNet128, ImageNet256 and Kinetics600. We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions. 1: Use the sigmoid loss-weighting (Kingma & Gao, 2023) with our prescribed hyper-parameters. 2: Use our simplified memory-efficient architecture with fewer skip-connections. 3: Scale the model to favor processing the image at a high resolution with fewer parameters,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
