Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation
Alham Fikri Aji, Kenneth Heafield

TL;DR
This study demonstrates that fully synthetic data generated through round-trip translation enhances neural machine translation performance, especially when combining source and target monolingual data and considering test set provenance.
Contribution
It introduces a novel approach of using fully synthetic data via round-trip translation for knowledge distillation in neural machine translation.
Findings
Combining source and target monolingual data improves translation quality.
The effectiveness of data augmentation depends on test set language origin.
Round-trip translation of target language monolinguals yields significant gains.
Abstract
This paper explores augmenting monolingual data for knowledge distillation in neural machine translation. Source language monolingual text can be incorporated as a forward translation. Interestingly, we find the best way to incorporate target language monolingual text is to translate it to the source language and round-trip translate it back to the target language, resulting in a fully synthetic corpus. We find that combining monolingual data from both source and target languages yields better performance than a corpus twice as large only in one language. Moreover, experiments reveal that the improvement depends upon the provenance of the test set. If the test set was originally in the source language (with the target side written by translators), then forward translating source monolingual data matters. If the test set was originally in the target language (with the source written by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Bioinformatics
MethodsKnowledge Distillation
