Combining SMT and NMT Back-Translated Data for Efficient NMT
Alberto Poncelas, Maja Popovic, Dimitar Shterionov, Gideon Maillette, de Buy Wenniger, Andy Way

TL;DR
This paper investigates the impact of combining back-translated data from both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) models on the performance of NMT systems, finding that merging these approaches yields the best results.
Contribution
It introduces a novel analysis of using combined SMT and NMT back-translated data for data augmentation in NMT training.
Findings
Best performance achieved with combined SMT and NMT back-translated data.
Synthetic data from multiple MT approaches improves translation quality.
Merging different MT approaches outperforms using a single approach.
Abstract
Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
