# CUNI Systems for the Unsupervised News Translation Task in WMT 2019

**Authors:** Ivana Kvapil\'ikov\'a, Dominik Mach\'a\v{c}ek, Ond\v{r}ej Bojar

arXiv: 1907.12664 · 2019-07-31

## TL;DR

This paper presents the CUNI unsupervised news translation system for WMT 2019, combining phrase-based seed models, cross-lingual embeddings, and neural translation trained on synthetic data, achieving a BLEU score of 15.3.

## Contribution

The paper introduces a novel unsupervised translation approach leveraging cross-lingual embeddings and synthetic data, with special handling of named entities.

## Key findings

- Achieved BLEU score of 15.3 on German-Czech translation.
- Demonstrated effectiveness of combining phrase-based and neural methods.
- Improved handling of named entities in unsupervised translation.

## Abstract

In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow the strategy of Artexte et al. (2018b), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings trained on monolingual data, followed by a neural machine translation system trained on synthetic parallel data. The synthetic corpus was produced from a monolingual corpus by a tuned PBMT model refined through iterative back-translation. We further focus on the handling of named entities, i.e. the part of vocabulary where the cross-lingual embedding mapping suffers most. Our system reaches a BLEU score of 15.3 on the German-Czech WMT19 shared task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12664/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12664/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1907.12664/full.md

---
Source: https://tomesphere.com/paper/1907.12664