Learning When to Attend for Neural Machine Translation
Junhui Li, Muhua Zhu

TL;DR
This paper introduces a novel attention mechanism for neural machine translation that dynamically decides when to attend to source words, improving translation quality on Chinese-English tasks.
Contribution
The paper proposes a new attention model that determines when the decoder should attend to source words, addressing limitations of previous attention mechanisms.
Findings
Achieves 0.8 BLEU score improvement over baseline
Demonstrates effectiveness on NIST Chinese-English translation
Addresses the issue of target words with no source counterparts
Abstract
In the past few years, attention mechanisms have become an indispensable component of end-to-end neural machine translation models. However, previous attention models always refer to some source words when predicting a target word, which contradicts with the fact that some target words have no corresponding source words. Motivated by this observation, we propose a novel attention model that has the capability of determining when a decoder should attend to source words and when it should not. Experimental results on NIST Chinese-English translation tasks show that the new model achieves an improvement of 0.8 BLEU score over a state-of-the-art baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
