PreCog: Improving Crowdsourced Data Quality Before Acquisition
Hamed Nilforoshan, Jiannan Wang, Eugene Wu

TL;DR
PreCog introduces interface optimizations with feedback mechanisms to improve crowdsourced data quality before submission, effectively doubling high-quality data collection compared to traditional post-hoc methods.
Contribution
The paper presents the Precog system that employs segmentation and prescriptive explanations to enhance data quality pre-collection, a novel approach complementing existing post-hoc techniques.
Findings
Precog collects 2x more high-quality text data than non-Precog methods.
Segment-Predict-Explain pattern effectively detects and improves low-quality text segments.
Pre-hoc interface optimizations significantly enhance data quality in crowdsourcing.
Abstract
Quality control in crowdsourcing systems is crucial. It is typically done after data collection, often using additional crowdsourced tasks to assess and improve the quality. These post-hoc methods can easily add cost and latency to the acquisition process--particularly if collecting high-quality data is important. In this paper, we argue for pre-hoc interface optimizations based on feedback that helps workers improve data quality before it is submitted and is well suited to complement post-hoc techniques. We propose the Precog system that explicitly supports such interface optimizations for common integrity constraints as well as more ambiguous text acquisition tasks where quality is ill-defined. We then develop the Segment-Predict-Explain pattern for detecting low-quality text segments and generating prescriptive explanations to help the worker improve their text input. Our unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Data Quality and Management
