AI-Assisted Human Evaluation of Machine Translation
Vil\'em Zouhar, Tom Kocmi, Mrinmaya Sachan

TL;DR
This paper introduces an AI-assisted annotation protocol for machine translation evaluation that halves annotation time and reduces costs by pre-filling error spans, while maintaining high quality and minimizing bias.
Contribution
It presents a novel AI-assisted annotation method that improves efficiency and reduces costs in human evaluation of machine translation quality.
Findings
AI assistance cuts annotation time by 50%
Pre-filled error spans improve annotation accuracy
Filtering reduces annotation budget by nearly 25%
Abstract
Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires a lot of expert human labor. In the recently adopted annotation protocol, Error Span Annotation (ESA), annotators mark erroneous parts of the translation and then assign a final score. A lot of the annotator time is spent on scanning the translation for possible errors. In our work, we help the annotators by pre-filling the error annotations with recall-oriented automatic quality estimation. With this AI assistance, we obtain annotations at the same quality level while cutting down the time per span annotation by half (71s/error span 31s/error span). The biggest advantage of the ESA protocol is an accurate priming of annotators (pre-filled error spans) before they assign the final score. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
