Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen,, Lu Yuan, Baining Guo

TL;DR
The paper introduces VQ-Diffusion, a novel text-to-image generation model that leverages vector quantization and diffusion processes to produce higher quality images more efficiently than previous methods.
Contribution
It proposes a new latent-space diffusion approach using VQ-VAE and a mask-and-replace strategy, significantly improving image quality and generation speed over existing models.
Findings
VQ-Diffusion outperforms autoregressive models in image quality.
The method handles complex scenes better than GAN-based approaches.
Generation speed is fifteen times faster with reparameterization.
Abstract
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsDiffusion
