Theory and Experiments on Vector Quantized Autoencoders

Aurko Roy; Ashish Vaswani; Arvind Neelakantan; Niki Parmar

arXiv:1805.11063·cs.LG·July 23, 2018·57 cites

Theory and Experiments on Vector Quantized Autoencoders

Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar

PDF

Open Access 2 Repos

TL;DR

This paper explores improved training methods for vector quantized autoencoders, demonstrating enhanced image generation and faster non-autoregressive machine translation with performance close to autoregressive models.

Contribution

It introduces an EM-inspired training technique for VQ-VAE and combines it with knowledge distillation to improve image and machine translation tasks.

Findings

01

Better image generation on CIFAR-10

02

Non-autoregressive translation accuracy close to autoregressive models

03

Inference speed increased by 3.3 times

Abstract

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however, despite several recent improvements, the training of discrete latent variable models has remained challenging and their performance has mostly failed to match their continuous counterparts. Recent work on vector quantized autoencoders (VQ-VAE) has made substantial progress in this direction, with its perplexity almost matching that of a VAE on datasets such as CIFAR-10. In this work, we investigate an alternate training technique for VQ-VAE, inspired by its connection to the Expectation Maximization (EM) algorithm. Training the discrete bottleneck with EM helps us achieve better image generation results on CIFAR-10, and together with knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Multi-Head Attention · Byte Pair Encoding · Dense Connections