Crowd & Prejudice: An Impossibility Theorem for Crowd Labelling without a Gold Standard
Nicol\'as Della Penna, Mark D. Reid

TL;DR
This paper proves that crowd labeling algorithms without initial trusted data can fail due to shared prejudices, but introducing a small amount of gold standard data can prevent such failures.
Contribution
It provides a game-theoretic impossibility theorem showing the limitations of crowd labeling without ground truth and highlights the importance of minimal trusted data.
Findings
Shared prejudices lead to equilibrium where all workers report prejudiced labels
A small amount of gold standard data can eliminate prejudiced equilibria
Algorithms relying solely on shared prejudices without trusted data are fundamentally limited
Abstract
A common use of crowd sourcing is to obtain labels for a dataset. Several algorithms have been proposed to identify uninformative members of the crowd so that their labels can be disregarded and the cost of paying them avoided. One common motivation of these algorithms is to try and do without any initial set of trusted labeled data. We analyse this class of algorithms as mechanisms in a game-theoretic setting to understand the incentives they create for workers. We find an impossibility result that without any ground truth, and when workers have access to commonly shared 'prejudices' upon which they agree but are not informative of true labels, there is always equilibria where all agents report the prejudice. A small amount amount of gold standard data is found to be sufficient to rule out these equilibria.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Auction Theory and Applications · Privacy, Security, and Data Protection
