Generating Diverse High-Fidelity Images with VQ-VAE-2

Ali Razavi; Aaron van den Oord; Oriol Vinyals

arXiv:1906.00446·cs.LG·June 4, 2019·106 cites

Generating Diverse High-Fidelity Images with VQ-VAE-2

Ali Razavi, Aaron van den Oord, Oriol Vinyals

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces VQ-VAE-2, a scalable and efficient generative model that produces high-fidelity, diverse images by combining hierarchical VQ-VAE with powerful autoregressive priors, outperforming previous models on complex datasets.

Contribution

The paper presents a multi-scale hierarchical VQ-VAE with enhanced autoregressive priors, enabling faster sampling and higher quality image generation compared to prior VQ-VAE models.

Findings

01

Generated images with quality comparable to GANs on ImageNet

02

Sampling in latent space is significantly faster than pixel space

03

Model avoids GAN issues like mode collapse and lack of diversity

Abstract

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

AI Creates Near Perfect Images Of People, Dogs and More· youtube

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Digital Media Forensic Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · PixelCNN · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Residual Connection · Residual Block · Dense Connections · Feedforward Network · VQ-VAE-2