A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

Michela Meister; Gregory Valiant

arXiv:1708.02740·cs.LG·August 10, 2017·1 cites

A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

Michela Meister, Gregory Valiant

PDF

Open Access

TL;DR

This paper introduces a semi-verified learning model that efficiently recovers most true variable values from large, noisy, crowdsourced data when a sufficient fraction of evaluators are reliable, even with limited verified data.

Contribution

It provides a theoretical framework and an efficient algorithm for semi-verified learning in the small-alpha regime, extending understanding of data extraction from unreliable crowdsourced datasets.

Findings

01

Achieves accurate recovery with a large number of evaluators, exceeding n^r

02

Runs in linear time relative to dataset size

03

Applicable to practical scenarios like extracting cohort preferences from large datasets

Abstract

We consider a model of unreliable or crowdsourced data where there is an underlying set of $n$ binary variables, each evaluator contributes a (possibly unreliable or adversarial) estimate of the values of some subset of $r$ of the variables, and the learner is given the true value of a constant number of variables. We show that, provided an $α$ -fraction of the evaluators are "good" (either correct, or with independent noise rate $p < 1/2$ ), then the true values of a $(1 - ϵ)$ fraction of the $n$ underlying variables can be deduced as long as $α > 1/ (2 - 2 p)^{r}$ . This setting can be viewed as an instance of the semi-verified learning model introduced in [CSV17], which explores the tradeoff between the number of items evaluated by each worker and the fraction of good evaluators. Our results require the number of evaluators to be extremely large, $> n^{r}$ , although our algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Distributed Sensor Networks and Detection Algorithms