GLIDE: Towards Photorealistic Image Generation and Editing with   Text-Guided Diffusion Models

Alex Nichol; Prafulla Dhariwal; Aditya Ramesh; Pranav Shyam; Pamela; Mishkin; Bob McGrew; Ilya Sutskever; Mark Chen

arXiv:2112.10741·cs.CV·March 9, 2022·997 cites

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela, Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

PDF

Open Access 2 Repos 1 Models 5 Videos

TL;DR

This paper introduces GLIDE, a diffusion-based model for photorealistic image generation and editing guided by text prompts, demonstrating superior human-rated quality and capabilities in image inpainting.

Contribution

The paper presents a new diffusion model with classifier-free guidance for text-to-image synthesis and editing, outperforming previous models like DALL-E in photorealism and caption similarity.

Findings

01

Classifier-free guidance yields more photorealistic images.

02

GLIDE outperforms DALL-E in human evaluations.

03

Model can be fine-tuned for image inpainting and editing.

Abstract

Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
fusing/glide-base
model· ♡ 2
♡ 2

Videos

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models· youtube

OpenAI GLIDE AI: Astounding Power! 🤖· youtube

Diffusion models explained. How does OpenAI's GLIDE work?· youtube

OpenAI GLIDE (Diffusion) | ML Coding series | Towards Photorealistic Image Generation and Editing· youtube

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsDiffusion · Guided Language to Image Diffusion for Generation and Editing · Contrastive Language-Image Pre-training