Exploring Diversity in Back Translation for Low-Resource Machine   Translation

Laurie Burchell; Alexandra Birch; Kenneth Heafield

arXiv:2206.00564·cs.CL·September 1, 2023

Exploring Diversity in Back Translation for Low-Resource Machine Translation

Laurie Burchell, Alexandra Birch, Kenneth Heafield

PDF

Open Access 1 Repo

TL;DR

This paper introduces a nuanced framework for measuring lexical and syntactic diversity in back translation, demonstrating that higher diversity, especially lexical, improves low-resource neural machine translation.

Contribution

It proposes new metrics for diversity, analyzes their impact on translation quality, and shows nucleus sampling enhances diversity and performance in low-resource settings.

Findings

01

Nucleus sampling yields higher translation performance.

02

Lexical diversity is more crucial than syntactic diversity.

03

Diversity metrics correlate with improved translation quality.

Abstract

Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations. We argue that the definitions and metrics used to quantify 'diversity' in previous work have been insufficient. This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity. We present novel metrics for measuring these different aspects of diversity and carry out empirical analysis into the effect of these types of diversity on final neural machine translation model performance for low-resource English $\leftrightarrow$ Turkish and mid-resource English $\leftrightarrow$ Icelandic. Our findings show that generating back translation using nucleus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

laurieburchell/exploring-diversity-bt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification