Using Mechanical Turk to Build Machine Translation Evaluation Sets

Michael Bloodgood; Chris Callison-Burch

arXiv:1410.5491·cs.CL·October 22, 2014·35 cites

Using Mechanical Turk to Build Machine Translation Evaluation Sets

Michael Bloodgood, Chris Callison-Burch

PDF

Open Access

TL;DR

This paper explores using Amazon's Mechanical Turk to create cost-effective machine translation test sets, demonstrating that these sets are comparable in quality to professionally-made ones for evaluating system performance.

Contribution

It introduces a method for efficiently building MT test sets via MTurk and validates their effectiveness compared to traditional professional methods.

Findings

01

MTurk test sets are significantly cheaper to produce.

02

MTurk test sets produce similar evaluation results as professional sets.

03

Cost-effective approach for expanding MT evaluation resources.

Abstract

Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professionally-produced test sets. More importantly, in experiments with multiple MT systems, we find that the MTurk-produced test sets yield essentially the same conclusions regarding system performance as the professionally-produced test sets yield.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Topic Modeling · Natural Language Processing Techniques