Scientific Credibility of Machine Translation Research: A   Meta-Evaluation of 769 Papers

Benjamin Marie; Atsushi Fujita; Raphael Rubino

arXiv:2106.15195·cs.CL·June 30, 2021

Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers

Benjamin Marie, Atsushi Fujita, Raphael Rubino

PDF

2 Repos

TL;DR

This study critically examines the credibility of machine translation research evaluations over a decade, revealing concerning trends like overreliance on BLEU scores and lack of standardized reporting, and proposes guidelines for improvement.

Contribution

It provides the first large-scale meta-evaluation of MT papers, highlighting evaluation practices and proposing standards to enhance credibility.

Findings

01

Increased reliance on BLEU scores without significance testing.

02

Proliferation of metrics claiming to outperform BLEU.

03

Widespread lack of standardized reporting tools.

Abstract

This paper presents the first large-scale meta-evaluation of machine translation (MT). We annotated MT evaluations conducted in 769 research papers published from 2010 to 2020. Our study shows that practices for automatic MT evaluation have dramatically changed during the past decade and follow concerning trends. An increasing number of MT evaluations exclusively rely on differences between BLEU scores to draw conclusions, without performing any kind of statistical significance testing nor human evaluation, while at least 108 metrics claiming to be better than BLEU have been proposed. MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable. Furthermore, tools…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.