Look-ahead Attention for Generation in Neural Machine Translation

Long Zhou; Jiajun Zhang; Chengqing Zong

arXiv:1708.09217·cs.CL·August 31, 2017·2 cites

Look-ahead Attention for Generation in Neural Machine Translation

Long Zhou, Jiajun Zhang, Chengqing Zong

PDF

Open Access

TL;DR

This paper introduces a look-ahead attention mechanism for neural machine translation that directly models dependencies between target words, improving translation quality over existing models.

Contribution

The paper proposes a novel look-ahead attention mechanism and three integration patterns to enhance dependency modeling in NMT.

Findings

01

Significant improvements on Chinese-English translation

02

Enhanced modeling of target word dependencies

03

Outperforms state-of-the-art baselines

Abstract

The attention model has become a standard component in neural machine translation (NMT) and it guides translation process by selectively focusing on parts of the source sentence when predicting each target word. However, we find that the generation of a target word does not only depend on the source sentence, but also rely heavily on the previous generated target words, especially the distant words which are difficult to model by using recurrent neural networks. To solve this problem, we propose in this paper a novel look-ahead attention mechanism for generation in NMT, which aims at directly capturing the dependency relationship between target words. We further design three patterns to integrate our look-ahead attention into the conventional attention model. Experiments on NIST Chinese-to-English and WMT English-to-German translation tasks show that our proposed look-ahead attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications