# Data Augmentation for Low-Resource Neural Machine Translation

**Authors:** Marzieh Fadaee, Arianna Bisazza, Christof Monz

arXiv: 1705.00440 · 2018-02-14

## TL;DR

This paper introduces a novel data augmentation technique for low-resource neural machine translation that enhances translation quality by generating synthetic sentence pairs containing rare words, leading to significant BLEU score improvements.

## Contribution

The paper presents a new data augmentation method inspired by computer vision to improve low-resource NMT by focusing on rare words in synthetic contexts.

## Key findings

- Up to 2.9 BLEU point improvement over baseline
- Up to 3.2 BLEU point improvement over back-translation
- Effective in simulated low-resource scenarios

## Abstract

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.00440/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1705.00440/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1705.00440/full.md

---
Source: https://tomesphere.com/paper/1705.00440