Tag-less Back-Translation
Idris Abdulmumin, Bashir Shehu Galadanci, Aliyu Garba

TL;DR
This paper introduces a tag-less back-translation method for neural machine translation that treats synthetic and authentic data as out-of-domain and in-domain, respectively, improving training efficiency and translation quality without explicit tagging.
Contribution
It proposes a novel domain adaptation approach to back-translation, eliminating the need for explicit tags and enhancing performance on low-resource language pairs.
Findings
Outperforms standard back-translation on English-Vietnamese and English-German tasks.
Efficiently utilizes monolingual data without explicit tagging.
Improves low-resource neural machine translation performance.
Abstract
An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize the available huge amount of existing monolingual data because of the inability of translation models to differentiate between the authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In the approach --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
