Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu

TL;DR
This paper introduces VQ-VAE, a generative model that learns discrete representations to improve unsupervised learning and generation of high-quality images, videos, and speech, addressing issues like posterior collapse in VAEs.
Contribution
The paper presents VQ-VAE, a novel model that combines vector quantisation with variational autoencoders to learn discrete latent representations, enhancing generative capabilities.
Findings
Addresses posterior collapse in VAEs
Generates high-quality images, videos, and speech
Enables unsupervised phoneme learning
Abstract
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained· youtube
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsDilated Causal Convolution · Adam · Batch Normalization · Residual Connection · Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · VQ-VAE
