Zero-Shot Text-to-Image Generation

Aditya Ramesh; Mikhail Pavlov; Gabriel Goh; Scott Gray; Chelsea Voss,; Alec Radford; Mark Chen; Ilya Sutskever

arXiv:2102.12092·cs.CV·March 2, 2021·1.1k cites

Zero-Shot Text-to-Image Generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss,, Alec Radford, Mark Chen, Ilya Sutskever

PDF

Open Access 5 Repos 7 Models 2 Videos

TL;DR

This paper introduces a simple transformer-based method for zero-shot text-to-image generation that models text and image tokens jointly, achieving competitive results without complex training assumptions.

Contribution

The authors propose a straightforward autoregressive transformer approach that models text and image tokens together for zero-shot generation, simplifying previous methods.

Findings

01

Competitive zero-shot performance on text-to-image tasks

02

Achieves results comparable to domain-specific models

03

Simplifies the modeling approach with a single stream transformer

Abstract

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained· youtube

Zero-Shot Text-to-Image Generation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications

MethodsAdam · 1-bit Adam