Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation
Pei Zhang, Xu Zhang, Wei Chen, Jian Yu, Yanfeng Wang, Deyi Xiong

TL;DR
This paper introduces a framework for enhancing neural machine translation by training models to predict surrounding sentences, leading to more context-aware translations at the document level.
Contribution
It proposes novel methods to learn and incorporate contextualized sentence embeddings into NMT, improving translation quality over existing baselines.
Findings
Both proposed methods significantly improve translation quality.
Pre-training on large monolingual corpora enhances context modeling.
Joint training and pre-training & fine-tuning outperform baseline models.
Abstract
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence. In this paper, we propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence. By enforcing the NMT model to predict source context, we want the model to learn "contextualized" source sentence representations that capture document-level dependencies on the source side. We further propose two different methods to learn and integrate such contextualized sentence embeddings into NMT: a joint training method that jointly trains an NMT model with the source context prediction model and a pre-training & fine-tuning method that pretrains the source context prediction model on a large-scale monolingual document corpus and then fine-tunes it with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
