TL;DR
This paper introduces a simple yet effective method for integrating pre-trained language models into neural machine translation by training the translation model to predict residual probabilities, improving translation quality without complex architectures.
Contribution
The proposed residual probability method for LM integration is simpler and outperforms previous approaches like gating, backtranslation, and fusion techniques in NMT.
Findings
Achieved BLEU score improvements of +0.24 to +2.36 across multiple language pairs.
Outperforms previous LM integration methods in NMT.
Simplifies architecture by removing gating networks.
Abstract
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training: We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To achieve that, we train the translation model to predict the residual probability of the training data added to the prediction of the LM. This enables the TM to focus its capacity on modeling the source sentence since it can rely on the LM for fluency. We show that our method outperforms previous approaches to integrate LMs into NMT while the architecture is simpler as it does not require gating networks to balance TM and LM. We observe gains of between +0.24 and +2.36 BLEU on all four test sets (English-Turkish, Turkish-English, Estonian-English,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
