Vector Quantized Diffusion Model for Text-to-Image Synthesis

Shuyang Gu; Dong Chen; Jianmin Bao; Fang Wen; Bo Zhang; Dongdong Chen,; Lu Yuan; Baining Guo

arXiv:2111.14822·cs.CV·March 4, 2022·40 cites

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen,, Lu Yuan, Baining Guo

PDF

Open Access 2 Repos

TL;DR

The paper introduces VQ-Diffusion, a novel text-to-image generation model that leverages vector quantization and diffusion processes to produce higher quality images more efficiently than previous methods.

Contribution

It proposes a new latent-space diffusion approach using VQ-VAE and a mask-and-replace strategy, significantly improving image quality and generation speed over existing models.

Findings

01

VQ-Diffusion outperforms autoregressive models in image quality.

02

The method handles complex scenes better than GAN-based approaches.

03

Generation speed is fifteen times faster with reparameterization.

Abstract

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsDiffusion