Neural Machine Translation Training in a Multi-Domain Scenario
Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Yonatan Belinkov, and Stephan Vogel

TL;DR
This paper investigates various training strategies for neural machine translation in multi-domain settings, finding that concatenation with fine-tuning and model stacking yield the best translation quality.
Contribution
It introduces and compares multiple training approaches for multi-domain neural machine translation, highlighting effective methods like concatenation with fine-tuning and model stacking.
Findings
Concatenating out-of-domain data and fine-tuning yields the best translation quality.
Model stacking with incremental fine-tuning improves performance.
Weighted ensemble of models outperforms data selection in quality.
Abstract
In this paper, we explore alternative ways to train a neural machine translation system in a multi-domain scenario. We investigate data concatenation (with fine tuning), model stacking (multi-level fine tuning), data selection and multi-model ensemble. Our findings show that the best translation quality can be achieved by building an initial system on a concatenation of available out-of-domain data and then fine-tuning it on in-domain data. Model stacking works best when training begins with the furthest out-of-domain data and the model is incrementally fine-tuned with the next furthest domain and so on. Data selection did not give the best results, but can be considered as a decent compromise between training time and translation quality. A weighted ensemble of different individual models performed better than data selection. It is beneficial in a scenario when there is no time for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
