# The University of Helsinki submissions to the WMT19 news translation   task

**Authors:** Aarne Talman, Umut Sulubacak, Ra\'ul V\'azquez, Yves Scherrer, Sami, Virpioja, Alessandro Raganato, Arvi Hurskainen, J\"org Tiedemann

arXiv: 1906.04040 · 2019-06-11

## TL;DR

This paper details the University of Helsinki's submissions to the WMT19 news translation task, emphasizing data cleaning, model comparisons, and segmentation techniques across three language pairs.

## Contribution

It introduces data filtering methods, compares transformer and document-level models, and explores segmentation and rule-based systems for translation.

## Key findings

- Cleaner training data improved translation quality.
- Transformer models outperformed previous approaches.
- Segmentation and rule-based methods enhanced Finnish-English translation.

## Abstract

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.04040/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1906.04040/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1906.04040/full.md

---
Source: https://tomesphere.com/paper/1906.04040