The Effect of Translationese in Machine Translation Test Sets

Mike Zhang; Antonio Toral

arXiv:1906.08069·cs.CL·June 20, 2019·6 cites

The Effect of Translationese in Machine Translation Test Sets

Mike Zhang, Antonio Toral

PDF

Open Access 1 Repo

TL;DR

This paper investigates how translationese in test sets influences human evaluation scores and system rankings in machine translation, revealing that translationese can inflate scores and affect rankings, with impact varying by translation quality.

Contribution

It provides an in-depth analysis of translationese effects on test data, showing its influence on evaluation scores and system rankings in multiple translation directions.

Findings

01

Translationese inflates human evaluation scores.

02

System rankings can change due to translationese.

03

Impact of translationese inversely correlates with translation quality.

Abstract

The effect of translationese has been studied in the field of machine translation (MT), mostly with respect to training data. We study in depth the effect of translationese on test data, using the test sets from the last three editions of WMT's news shared task, containing 17 translation directions. We show evidence that (i) the use of translationese in test sets results in inflated human evaluation scores for MT systems; (ii) in some cases system rankings do change and (iii) the impact translationese has on a translation direction is inversely correlated to the translation quality attainable by state-of-the-art MT systems for that direction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjzha/translationese
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research