Understanding Back-Translation at Scale
Sergey Edunov, Myle Ott, Michael Auli, David Grangier

TL;DR
This paper investigates the effectiveness of different back-translation methods for neural machine translation, demonstrating that sampling and noised outputs outperform traditional beam search, especially at large scales, leading to state-of-the-art results.
Contribution
It provides a comprehensive analysis of back-translation techniques, highlighting the superiority of sampling and noised methods over beam search in most settings and scaling these methods to large datasets.
Findings
Sampling and noised back-translations outperform beam search in most scenarios.
Synthetic data with noise provides a stronger training signal than genuine bitext.
Achieved new state-of-the-art BLEU score of 35 on WMT'14 English-German.
Abstract
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search. We also compare how synthetic data compares to genuine bitext and study various domain effects. Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT'14 English-German test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
