Fast Decoding in Sequence Models using Discrete Latent Variables
{\L}ukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio,, Jakob Uszkoreit, Noam Shazeer

TL;DR
This paper introduces a novel approach to sequence modeling that uses discrete latent variables to significantly speed up decoding in neural machine translation, balancing efficiency and translation quality.
Contribution
The paper presents a new method for constructing discrete latent variables that enables faster decoding in sequence models, improving over previous non-autoregressive approaches.
Findings
Decoding speed is an order of magnitude faster than comparable autoregressive models.
The model achieves higher BLEU scores than previous non-autoregressive translation models.
The approach maintains competitive translation quality while significantly reducing inference time.
Abstract
Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet still operate sequentially during decoding. Inspired by [arxiv:1711.00937], we present a method to extend sequence models using discrete latent variables that makes decoding much more parallelizable. We first auto-encode the target sequence into a shorter sequence of discrete latent variables, which at inference time is generated autoregressively, and finally decode the output sequence from this shorter latent sequence in parallel. To this end, we introduce a novel method for constructing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Mixture of Logistic Distributions · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam
