Learning From Noisy Singly-labeled Data
Ashish Khetan, Zachary C. Lipton, Anima Anandkumar

TL;DR
This paper introduces a new algorithm for learning from noisy, singly-labeled crowd-sourced data, effectively estimating worker quality and optimizing label allocation to improve classifier performance.
Contribution
It proposes a novel alternating minimization algorithm that estimates worker quality from minimal labels and guides optimal labeling strategies, with theoretical and empirical validation.
Findings
The algorithm can estimate worker quality with only one label per example.
It outperforms existing methods in noisy label scenarios.
Labeling many examples once can be more effective than multiple labels per example when worker quality is high.
Abstract
Supervised learning depends on annotated examples, which are taken to be the \emph{ground truth}. But these labels often come from noisy crowdsourcing platforms, like Amazon Mechanical Turk. Practitioners typically collect multiple labels per example and aggregate the results to mitigate noise (the classic crowdsourcing problem). Given a fixed annotation budget and unlimited unlabeled data, redundant annotation comes at the expense of fewer labeled examples. This raises two fundamental questions: (1) How can we best learn from noisy workers? (2) How should we allocate our labeling budget to maximize the performance of a classifier? We propose a new algorithm for jointly modeling labels and worker quality from noisy crowd-sourced data. The alternating minimization proceeds in rounds, estimating worker quality from disagreement with the current model and then updating the model by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
