A Permutation-based Model for Crowd Labeling: Optimal Estimation and   Robustness

Nihar B. Shah; Sivaraman Balakrishnan; Martin J. Wainwright

arXiv:1606.09632·cs.LG·January 12, 2021

A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright

PDF

TL;DR

This paper introduces a permutation-based model for crowd labeling that generalizes existing models, providing optimal estimation methods and demonstrating robustness through theoretical analysis and empirical validation.

Contribution

It proposes a new permutation-based model for crowd labeling, derives sharp minimax rates, and develops two efficient estimators with performance guarantees.

Findings

01

The minimax rates match lower bounds under Dawid-Skene.

02

The WAN and OBI-WAN estimators perform well in simulations.

03

Experimental results validate theoretical predictions.

Abstract

The task of aggregating and denoising crowd-labeled data has gained increased significance with the advent of crowdsourcing platforms and massive datasets. We propose a permutation-based model for crowd labeled data that is a significant generalization of the classical Dawid-Skene model, and introduce a new error metric by which to compare different estimators. We derive global minimax rates for the permutation-based model that are sharp up to logarithmic factors, and match the minimax lower bounds derived under the simpler Dawid-Skene model. We then design two computationally-efficient estimators: the WAN estimator for the setting where the ordering of workers in terms of their abilities is approximately known, and the OBI-WAN estimator where that is not known. For each of these estimators, we provide non-asymptotic bounds on their performance. We conduct synthetic simulations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.