A GRU-Gated Attention Model for Neural Machine Translation
Biao Zhang, Deyi Xiong, Jinsong Su

TL;DR
This paper introduces a GRU-gated attention model for neural machine translation that produces more discriminative context vectors by making source representations sensitive to partial translations, leading to improved translation quality.
Contribution
The paper proposes a novel GRU-gated attention mechanism that enhances source representation discrimination in NMT, outperforming vanilla attention models.
Findings
GAtt models significantly improve translation quality over vanilla attention.
Enhanced discrimination of context vectors reduces over-translation issues.
Experimental results on Chinese-English translation demonstrate effectiveness.
Abstract
Neural machine translation (NMT) heavily relies on an attention network to produce a context vector for each target word prediction. In practice, we find that context vectors for different target words are quite similar to one another and therefore are insufficient in discriminatively predicting target words. The reason for this might be that context vectors produced by the vanilla attention network are just a weighted sum of source representations that are invariant to decoder states. In this paper, we propose a novel GRU-gated attention model (GAtt) for NMT which enhances the degree of discrimination of context vectors by enabling source representations to be sensitive to the partial translation generated by the decoder. GAtt uses a gated recurrent unit (GRU) to combine two types of information: treating a source annotation vector originally produced by the bidirectional encoder as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsGated Recurrent Unit
