Look-ahead Attention for Generation in Neural Machine Translation
Long Zhou, Jiajun Zhang, Chengqing Zong

TL;DR
This paper introduces a look-ahead attention mechanism for neural machine translation that directly models dependencies between target words, improving translation quality over existing models.
Contribution
The paper proposes a novel look-ahead attention mechanism and three integration patterns to enhance dependency modeling in NMT.
Findings
Significant improvements on Chinese-English translation
Enhanced modeling of target word dependencies
Outperforms state-of-the-art baselines
Abstract
The attention model has become a standard component in neural machine translation (NMT) and it guides translation process by selectively focusing on parts of the source sentence when predicting each target word. However, we find that the generation of a target word does not only depend on the source sentence, but also rely heavily on the previous generated target words, especially the distant words which are difficult to model by using recurrent neural networks. To solve this problem, we propose in this paper a novel look-ahead attention mechanism for generation in NMT, which aims at directly capturing the dependency relationship between target words. We further design three patterns to integrate our look-ahead attention into the conventional attention model. Experiments on NIST Chinese-to-English and WMT English-to-German translation tasks show that our proposed look-ahead attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
