Regularized Context Gates on Transformer for Machine Translation
Xintong Li, Lemao Liu, Rui Wang, Guoping Huang, Max Meng

TL;DR
This paper introduces a regularized context gate mechanism for Transformer-based neural machine translation, improving source-target contribution control and achieving consistent BLEU score gains across multiple datasets.
Contribution
It proposes a novel method to incorporate and regularize context gates within Transformer architecture for enhanced translation quality.
Findings
Achieved an average of 1.0 BLEU score improvement over baseline.
Effectively controlled source and target contributions in Transformer.
Validated on four translation datasets.
Abstract
Context gates are effective to control the contributions from the source and target contexts in the recurrent neural network (RNN) based neural machine translation (NMT). However, it is challenging to extend them into the advanced Transformer architecture, which is more complicated than RNN. This paper first provides a method to identify source and target contexts and then introduce a gate mechanism to control the source and target contributions in Transformer. In addition, to further reduce the bias problem in the gate mechanism, this paper proposes a regularization method to guide the learning of the gates with supervision automatically generated using pointwise mutual information. Extensive experiments on 4 translation datasets demonstrate that the proposed model obtains an averaged gain of 1.0 BLEU score over a strong Transformer baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
