GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela, Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

TL;DR
This paper introduces GLIDE, a diffusion-based model for photorealistic image generation and editing guided by text prompts, demonstrating superior human-rated quality and capabilities in image inpainting.
Contribution
The paper presents a new diffusion model with classifier-free guidance for text-to-image synthesis and editing, outperforming previous models like DALL-E in photorealism and caption similarity.
Findings
Classifier-free guidance yields more photorealistic images.
GLIDE outperforms DALL-E in human evaluations.
Model can be fine-tuned for image inpainting and editing.
Abstract
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models· youtube
OpenAI GLIDE AI: Astounding Power! 🤖· youtube
Diffusion models explained. How does OpenAI's GLIDE work?· youtube
OpenAI GLIDE (Diffusion) | ML Coding series | Towards Photorealistic Image Generation and Editing· youtube
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsDiffusion · Guided Language to Image Diffusion for Generation and Editing · Contrastive Language-Image Pre-training
