Fast Decoding in Sequence Models using Discrete Latent Variables

{\L}ukasz Kaiser; Aurko Roy; Ashish Vaswani; Niki Parmar; Samy Bengio,; Jakob Uszkoreit; Noam Shazeer

arXiv:1803.03382·cs.LG·June 11, 2018·178 cites

Fast Decoding in Sequence Models using Discrete Latent Variables

{\L}ukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio,, Jakob Uszkoreit, Noam Shazeer

PDF

Open Access

TL;DR

This paper introduces a novel approach to sequence modeling that uses discrete latent variables to significantly speed up decoding in neural machine translation, balancing efficiency and translation quality.

Contribution

The paper presents a new method for constructing discrete latent variables that enables faster decoding in sequence models, improving over previous non-autoregressive approaches.

Findings

01

Decoding speed is an order of magnitude faster than comparable autoregressive models.

02

The model achieves higher BLEU scores than previous non-autoregressive translation models.

03

The approach maintains competitive translation quality while significantly reducing inference time.

Abstract

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet still operate sequentially during decoding. Inspired by [arxiv:1711.00937], we present a method to extend sequence models using discrete latent variables that makes decoding much more parallelizable. We first auto-encode the target sequence into a shorter sequence of discrete latent variables, which at inference time is generated autoregressively, and finally decode the output sequence from this shorter latent sequence in parallel. To this end, we introduce a novel method for constructing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Mixture of Logistic Distributions · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam