Semi-verified PAC Learning from the Crowd

Shiwei Zeng; Jie Shen

arXiv:2106.07080·cs.LG·May 22, 2023

Semi-verified PAC Learning from the Crowd

Shiwei Zeng, Jie Shen

PDF

Open Access

TL;DR

This paper introduces a semi-verified PAC learning framework for threshold functions in crowdsourcing, handling adversarial and noisy workers, and reducing labeling costs through comparison queries, without relying on distributional assumptions.

Contribution

It extends semi-verified PAC learning to more challenging crowdsourcing scenarios with adversarial workers and noise, using limited trusted labels and comparison queries.

Findings

01

PAC learning is feasible with adversarial and noisy crowd workers.

02

Comparison queries significantly reduce labeling costs.

03

Guarantees hold without distributional assumptions.

Abstract

We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise - a significant generalization of the perfectness assumption. We show that under the {semi-verified model} of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Imbalanced Data Classification Techniques