Enhanced Nearest Neighbor Classification for Crowdsourcing

Jiexin Duan; Xingye Qiao; Guang Cheng

arXiv:2203.00781·cs.HC·March 3, 2022

Enhanced Nearest Neighbor Classification for Crowdsourcing

Jiexin Duan, Xingye Qiao, Guang Cheng

PDF

Open Access

TL;DR

This paper introduces an enhanced nearest neighbor classifier tailored for crowdsourcing scenarios, effectively handling noisy labels by estimating worker quality and achieving near-oracle performance.

Contribution

It proposes two algorithms for estimating worker quality in crowdsourcing, improving classification accuracy despite label noise, with theoretical regret guarantees.

Findings

01

Algorithms achieve regret comparable to oracle methods.

02

Proven lower bound on sample size for optimal regret convergence.

03

Numerical experiments validate the effectiveness of the methods.

Abstract

In machine learning, crowdsourcing is an economical way to label a large amount of data. However, the noise in the produced labels may deteriorate the accuracy of any classification method applied to the labelled data. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Two algorithms are developed to estimate the worker quality (which is often unknown in practice): one is to construct the estimate based on the denoised worker labels by applying the $k$ NN classifier to the expert data; the other is an iterative algorithm that works even without access to the expert data. Other than strong numerical evidence, our proposed methods are proven to achieve the same regret as its oracle version based on high-quality expert data. As a technical by-product, a lower bound on the sample size assigned to each worker to reach the optimal convergence rate of regret is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Distributed Sensor Networks and Detection Algorithms · Advanced Statistical Process Monitoring