Inferring ground truth from multi-annotator ordinal data: a probabilistic approach
Balaji Lakshminarayanan, Yee Whye Teh

TL;DR
This paper introduces a probabilistic model for inferring true ordinal labels from crowdsourced data, accounting for annotator expertise and instance difficulty, and demonstrates improved robustness over existing methods.
Contribution
The paper proposes a novel Bayesian model for ordinal crowdsourced data that considers annotator expertise and instance difficulty, with an efficient inference algorithm.
Findings
The model outperforms existing methods on real datasets.
It is more resistant to spammy annotators.
It performs as well or better than state-of-the-art models.
Abstract
A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical counterparts and have received little attention in the crowdsourcing literature. We propose a new model for crowdsourced ordinal data that accounts for instance difficulty as well as annotator expertise, and derive a variational Bayesian inference algorithm for parameter estimation. We analyze the ordinal extensions of several state-of-the-art annotator models for binary/categorical labels and evaluate the performance of all the models on two real world datasets containing ordinal query-URL relevance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data
