TL;DR
This paper introduces an empirical methodology using CrowdTruth metrics to improve the quality of crowdsourced ground truth data by capturing annotator disagreement across diverse domains and tasks.
Contribution
It presents a novel empirical approach that emphasizes disagreement measurement for better ground truth quality in crowdsourcing, challenging the reliance on majority voting.
Findings
Measuring disagreement improves data quality.
More crowd workers lead to more stable annotations.
CrowdTruth metrics outperform majority vote in diverse tasks.
Abstract
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
