With a Little Help from the Authors: Reproducing Human Evaluation of an   MT Error Detector

Ond\v{r}ej Pl\'atek; Mateusz Lango; Ond\v{r}ej Du\v{s}ek

arXiv:2308.06527·cs.CL·August 15, 2023

With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

Ond\v{r}ej Pl\'atek, Mateusz Lango, Ond\v{r}ej Du\v{s}ek

PDF

Open Access 1 Repo

TL;DR

This paper attempts to reproduce a human evaluation study of an MT error detector, confirming its main conclusions but highlighting variability in human annotations and reproducibility challenges.

Contribution

It provides a detailed reproduction of a previous human evaluation experiment and discusses reproducibility issues and variability in human annotations.

Findings

01

Replicated results generally confirm original conclusions

02

Identified high variability in human annotation

03

Highlighted reproducibility challenges in human evaluation

Abstract

This work presents our efforts to reproduce the results of the human evaluation experiment presented in the paper of Vamvas and Sennrich (2022), which evaluated an automatic system detecting over- and undertranslations (translations containing more or less information than the original) in machine translation (MT) outputs. Despite the high quality of the documentation and code provided by the authors, we discuss some problems we found in reproducing the exact experimental setup and offer recommendations for improving reproducibility. Our replicated results generally confirm the conclusions of the original study, but in some cases, statistically significant differences were observed, suggesting a high variability of human annotation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oplatek/reprohum-as-little-as-possible
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling