# On The Evaluation of Machine Translation Systems Trained With   Back-Translation

**Authors:** Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli

arXiv: 1908.05204 · 2020-08-19

## TL;DR

This paper evaluates the effectiveness of back-translation in machine translation, showing it improves translation quality for natural and translationese text according to human judgment, despite BLEU score limitations.

## Contribution

It empirically demonstrates that back-translation benefits both natural and translationese texts and advocates for using language model scores alongside BLEU to assess fluency.

## Key findings

- Back-translation improves translation quality for natural and translationese texts.
- BLEU score alone does not reflect human preferences for fluency.
- Language model scores complement BLEU in evaluating translation fluency.

## Abstract

Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training data. In this work, we show that this conjecture is not empirically supported and that back-translation improves translation quality of both naturally occurring text as well as translationese according to professional human translators. We provide empirical evidence to support the view that back-translation is preferred by humans because it produces more fluent outputs. BLEU cannot capture human preferences because references are translationese when source sentences are natural text. We recommend complementing BLEU with a language model score to measure fluency.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05204/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1908.05204/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1908.05204/full.md

---
Source: https://tomesphere.com/paper/1908.05204