Sequence to Sequence Learning with Neural Networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le

TL;DR
This paper introduces an end-to-end sequence-to-sequence learning approach using deep LSTM networks, achieving competitive translation quality on English-French translation tasks and demonstrating advantages over traditional methods.
Contribution
The paper presents a novel neural network architecture for sequence-to-sequence learning that requires minimal assumptions and improves translation performance.
Findings
LSTM-based model achieved a BLEU score of 34.8 on WMT'14 English-French translation.
Reversing source sentence word order improved model performance significantly.
LSTM effectively learned phrase and sentence representations sensitive to word order.
Abstract
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Spence Green — Enterprise-scale Machine Translation· youtube
Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!· youtube
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Sequence to Sequence · Long Short-Term Memory
