Inferring the ground truth through crowdsourcing

Jean Pierre Char

arXiv:1807.11836·cs.LG·August 1, 2018

Inferring the ground truth through crowdsourcing

Jean Pierre Char

PDF

Open Access

TL;DR

This paper discusses methods for inferring reliable ground truth data from crowdsourced annotations and autonomous agents, especially when true labels are difficult or costly to obtain, emphasizing verification and aggregation techniques.

Contribution

It introduces approaches for inferring and verifying ground truth from crowdsourcing and autonomous agents, addressing challenges in sensitive and complex annotation tasks.

Findings

01

Effective aggregation improves label accuracy

02

Verification processes enhance data reliability

03

Applicable to sensitive domains like medical imaging

Abstract

Universally valid ground truth is almost impossible to obtain or would come at a very high cost. For supervised learning without universally valid ground truth, a recommended approach is applying crowdsourcing: Gathering a large data set annotated by multiple individuals of varying possibly expertise levels and inferring the ground truth data to be used as labels to train the classifier. Nevertheless, due to the sensitivity of the problem at hand (e.g. mitosis detection in breast cancer histology images), the obtained data needs verification and proper assessment before being used for classifier training. Even in the context of organic computing systems, an indisputable ground truth might not always exist. Therefore, it should be inferred through the aggregation and verification of the local knowledge of each autonomous agent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data