False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking
Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao

TL;DR
This paper introduces a statistical framework to detect and control position bias among crowdsourced annotators, ensuring the reliability of labels without prior knowledge of biased annotators, supported by experiments on simulated and real data.
Contribution
The paper develops a novel statistical method using knockoff filters and Inverse Scale Space algorithms to identify biased annotators and control false discovery rate in crowdsourcing data.
Findings
Effective detection of position bias in simulated data
Successful application to real-world crowdsourcing datasets
Framework ensures high-quality, reliable labels in large-scale annotation tasks
Abstract
With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a statistical framework to model and detect annotator's position bias in order to control the false discovery rate (FDR) without a prior knowledge on the amount of biased annotators - the expected fraction of false discoveries among all discoveries being not too high, in order to assure that most of the discoveries are indeed true and replicable. The key technical development relies on some new knockoff filters adapted to our problem and new algorithms based on the Inverse Scale Space dynamics whose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques
