# Tagged Back-Translation

**Authors:** Isaac Caswell, Ciprian Chelba, David Grangier

arXiv: 1906.06442 · 2019-06-18

## TL;DR

This paper demonstrates that tagging back-translated data with an extra token is a simpler yet effective alternative to noising techniques in neural machine translation, improving performance on certain language pairs.

## Contribution

The authors propose a straightforward tagging method for back-translated data that outperforms noising techniques in NMT, simplifying the process.

## Key findings

- Tagging back-translated data improves translation quality.
- Tagging outperforms noising in English-Romanian.
- Achieves state-of-the-art results on English-Romanian.

## Abstract

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, re-defining state-of-the-art in the former.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.06442/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1906.06442/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1906.06442/full.md

---
Source: https://tomesphere.com/paper/1906.06442