Reassessing Claims of Human Parity and Super-Human Performance in   Machine Translation at WMT 2019

Antonio Toral

arXiv:2005.05738·cs.CL·May 13, 2020·24 cites

Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019

Antonio Toral

PDF

Open Access 1 Repo

TL;DR

This paper critically reevaluates the claims of human parity and super-human performance in machine translation at WMT 2019, revealing that most claims are unfounded when accounting for evaluation limitations.

Contribution

The study identifies key issues in previous human evaluation methods and provides a revised assessment that challenges earlier claims of human parity and super-human translation performance.

Findings

01

Most claims of human parity are refuted, except for English-to-German.

02

Evaluation issues include limited context, evaluator proficiency, and reference reliance.

03

Revised evaluation suggests current models still lag behind human translation in most cases.

Abstract

We reassess the claims of human parity and super-human performance made at the news shared task of WMT 2019 for three translation directions: English-to-German, English-to-Russian and German-to-English. First we identify three potential issues in the human evaluation of that shared task: (i) the limited amount of intersentential context available, (ii) the limited translation proficiency of the evaluators and (iii) the use of a reference translation. We then conduct a modified evaluation taking these issues into account. Our results indicate that all the claims of human parity and super-human performance made at WMT 2019 should be refuted, except the claim of human parity for English-to-German. Based on our findings, we put forward a set of recommendations and open questions for future assessments of human parity in machine translation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

antot/human_parity_eamt2020
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Hate Speech and Cyberbullying Detection