Empirical Methodology for Crowdsourcing Ground Truth

Anca Dumitrache; Oana Inel; Benjamin Timmermans; Carlos Ortiz,; Robert-Jan Sips; Lora Aroyo; Chris Welty

arXiv:1809.08888·cs.HC·September 21, 2022

Empirical Methodology for Crowdsourcing Ground Truth

Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz,, Robert-Jan Sips, Lora Aroyo, Chris Welty

PDF

1 Repo

TL;DR

This paper introduces an empirical methodology using CrowdTruth metrics to improve the quality of crowdsourced ground truth data by capturing annotator disagreement across diverse domains and tasks.

Contribution

It presents a novel empirical approach that emphasizes disagreement measurement for better ground truth quality in crowdsourcing, challenging the reliance on majority voting.

Findings

01

Measuring disagreement improves data quality.

02

More crowd workers lead to more stable annotations.

03

CrowdTruth metrics outperform majority vote in diverse tasks.

Abstract

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CrowdTruth/Cross-Task-Majority-Vote-Eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.