# The University of Edinburgh's Submissions to the WMT19 News Translation   Task

**Authors:** Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz,, Faheem Kirefu, Antonio Valerio Miceli Barone, Alexandra Birch

arXiv: 1907.05854 · 2019-07-15

## TL;DR

This paper details the University of Edinburgh's participation in the WMT19 News Translation task across six language pairs, employing techniques like back-translation, semi-supervised learning, and tokenization strategies to improve translation quality.

## Contribution

It introduces comprehensive experiments across multiple language pairs, exploring back-translation, cross-lingual pre-training, pivoting, and tokenization methods to enhance machine translation performance.

## Key findings

- Back-translation improves translation quality across all language pairs.
- Semi-supervised MT with cross-lingual pre-training benefits low-resource languages.
- Character-based tokenization offers advantages for Chinese translation.

## Abstract

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05854/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05854/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1907.05854/full.md

---
Source: https://tomesphere.com/paper/1907.05854