Semi-verified PAC Learning from the Crowd
Shiwei Zeng, Jie Shen

TL;DR
This paper introduces a semi-verified PAC learning framework for threshold functions in crowdsourcing, handling adversarial and noisy workers, and reducing labeling costs through comparison queries, without relying on distributional assumptions.
Contribution
It extends semi-verified PAC learning to more challenging crowdsourcing scenarios with adversarial workers and noise, using limited trusted labels and comparison queries.
Findings
PAC learning is feasible with adversarial and noisy crowd workers.
Comparison queries significantly reduce labeling costs.
Guarantees hold without distributional assumptions.
Abstract
We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise - a significant generalization of the perfectness assumption. We show that under the {semi-verified model} of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Imbalanced Data Classification Techniques
