Learning Source Phrase Representations for Neural Machine Translation
Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu and, Jingyi Zhang

TL;DR
This paper introduces a method to generate and incorporate phrase representations into Transformer-based neural machine translation models, significantly improving long-distance dependency modeling and translation quality with fewer parameters.
Contribution
It proposes an attentive phrase representation mechanism and demonstrates its effectiveness in enhancing Transformer NMT models, especially for long sentences.
Findings
Significant improvements on WMT 14 English-German and French translation tasks.
Transformer models with phrase representations match or surpass Transformer Big performance.
Enhanced long-distance dependency capture with fewer parameters.
Abstract
The Transformer translation model (Vaswani et al., 2017) based on a multi-head attention mechanism can be computed effectively in parallel and has significantly pushed forward the performance of Neural Machine Translation (NMT). Though intuitively the attentional network can connect distant words via shorter network paths than RNNs, empirical analysis demonstrates that it still has difficulty in fully capturing long-distance dependencies (Tang et al., 2018). Considering that modeling phrases instead of words has significantly improved the Statistical Machine Translation (SMT) approach through the use of larger translation blocks ("phrases") and its reordering ability, modeling NMT at phrase level is an intuitive proposal to help the model capture long-distance relationships. In this paper, we first propose an attentive phrase representation generation mechanism which is able to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding
