On User Interfaces for Large-Scale Document-Level Human Evaluation of   Machine Translation Outputs

Roman Grundkiewicz; Marcin Junczys-Dowmunt; Christian Federmann and; Tom Kocmi

arXiv:2104.10408·cs.CL·April 22, 2021·1 cites

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann and, Tom Kocmi

PDF

Open Access

TL;DR

This paper investigates how different user interface designs affect the quality and reliability of human evaluations of machine translation outputs at the document level, highlighting a trade-off between assessment quality and time consumption.

Contribution

It compares two evaluation methods and demonstrates that a document-centric interface improves assessment quality and agreement, providing insights for designing better evaluation tools.

Findings

01

Document-centric interface yields higher assessment quality.

02

Improved correlation between segment and document scores.

03

Increased inter-annotator agreement for document scores.

Abstract

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis shows that a document-centric approach to evaluation where the annotator is presented with the entire document context on a screen leads to higher quality segment and document level assessments. It improves the correlation between segment and document scores and increases inter-annotator agreement for document scores but is considerably more time consuming for annotators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems