Active Learning for Crowd-Sourced Databases
Barzan Mozafari, Purnamrita Sarkar, Michael J. Franklin, Michael I., Jordan, Samuel Madden

TL;DR
This paper introduces active learning algorithms that effectively combine human input and machine learning to scale crowd-sourced databases, significantly reducing labeling costs while maintaining high accuracy.
Contribution
It presents two novel active learning algorithms based on non-parametric bootstrap theory that improve efficiency in crowd-sourced data labeling tasks.
Findings
Ask humans to label 10-100 times fewer items for the same accuracy.
Achieve 2-8 times fewer questions than previous active learning methods.
Demonstrate effectiveness on real-world datasets from Amazon Mechanical Turk and UCI.
Abstract
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks where humans are more accurate than computers, e.g., labeling images, matching objects, or analyzing sentiment. However, relying solely on the crowd is often impractical even for data sets with thousands of items, due to time and cost constraints of acquiring human input (which cost pennies and minutes per label). In this paper, we propose algorithms for integrating machine learning into crowd-sourced databases, with the goal of allowing crowd-sourcing applications to scale, i.e., to handle larger datasets at lower costs. The key observation is that, in many of the above tasks, humans and machine learning algorithms can be complementary, as humans are often more accurate but slow and expensive, while algorithms are usually less accurate, but faster and cheaper. Based on this observation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Mobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques
