Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N., Dauphin

TL;DR
This paper introduces a convolutional neural network architecture for sequence to sequence learning, enabling fully parallel computation, easier optimization, and achieving superior translation accuracy faster than traditional recurrent models.
Contribution
The authors present a novel convolutional architecture with gated linear units and attention mechanisms, outperforming LSTM-based models in translation tasks.
Findings
Outperforms deep LSTM on WMT translation benchmarks
Enables fully parallelized training and inference
Achieves faster speed on GPU and CPU
Abstract
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Algorithms and Data Compression
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
