Improving Neural Machine Translation with Pre-trained Representation
Rongxiang Weng, Heng Yu, Shujian Huang, Weihua Luo, Jiajun Chen

TL;DR
This paper introduces a novel method to enhance neural machine translation by leveraging sentence-level contextual representations learned from monolingual data, leading to improved translation quality especially in low-resource scenarios.
Contribution
The paper proposes a new structure for acquiring and integrating sentence-level contextual representations from monolingual data into NMT models, which was not fully exploited before.
Findings
Improves translation quality on Chinese-English and German-English tasks.
Effective in low-resource English-Turkish translation.
Outperforms strong Transformer baselines.
Abstract
Monolingual data has been demonstrated to be helpful in improving the translation quality of neural machine translation (NMT). The current methods stay at the usage of word-level knowledge, such as generating synthetic parallel data or extracting information from word embedding. In contrast, the power of sentence-level contextual knowledge which is more complex and diverse, playing an important role in natural language generation, has not been fully exploited. In this paper, we propose a novel structure which could leverage monolingual data to acquire sentence-level contextual representations. Then, we design a framework for integrating both source and target sentence-level representations into NMT model to improve the translation quality. Experimental results on Chinese-English, German-English machine translation tasks show that our proposed model achieves improvement over strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
