Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting   High-Quality Translation Data

Zhongtao Liu; Parker Riley; Daniel Deutsch; Alison Lui; Mengmeng Niu,; Apu Shah; Markus Freitag

arXiv:2410.11056·cs.CL·October 16, 2024

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu,, Apu Shah, Markus Freitag

PDF

Open Access 1 Video

TL;DR

This paper evaluates various human, machine, and hybrid approaches for collecting high-quality translation data, showing that human-machine collaboration can achieve comparable or better quality at reduced costs.

Contribution

It provides a comprehensive comparison of 11 translation data collection methods, highlighting the effectiveness and cost-efficiency of hybrid human-machine approaches.

Findings

01

Human-machine collaboration matches or exceeds human-only quality.

02

Hybrid methods reduce costs by approximately 40%.

03

A new dataset with nearly 18,000 translated segments is released.

Abstract

Collecting high-quality translations is crucial for the development and evaluation of machine translation systems. However, traditional human-only approaches are costly and slow. This study presents a comprehensive investigation of 11 approaches for acquiring translation data, including human-only, machineonly, and hybrid approaches. Our findings demonstrate that human-machine collaboration can match or even exceed the quality of human-only translations, while being more cost-efficient. Error analysis reveals the complementary strengths between human and machine contributions, highlighting the effectiveness of collaborative methods. Cost analysis further demonstrates the economic benefits of human-machine collaboration methods, with some approaches achieving top-tier quality at around 60% of the cost of traditional methods. We release a publicly available dataset containing nearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data· underline

Taxonomy

TopicsNatural Language Processing Techniques