Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
Aditya Parameswaran, Akash Das Sarma, Vipul Venkataraman

TL;DR
This paper surveys methods for optimizing open-ended crowdsourcing, addressing challenges like answer consensus, perspective differentiation, and operator selection, to improve data quality and management.
Contribution
It introduces a comprehensive overview of techniques for reasoning about and optimizing open-ended crowdsourcing, highlighting effective approaches based on the authors' experiences.
Findings
Effective methods for answer aggregation in open-ended tasks
Strategies for distinguishing diverse worker perspectives
Techniques for selecting suitable open-ended operators
Abstract
Crowdsourcing is the primary means to generate training data at scale, and when combined with sophisticated machine learning algorithms, crowdsourcing is an enabler for a variety of emergent automated applications impacting all spheres of our lives. This paper surveys the emerging field of formally reasoning about and optimizing open-ended crowdsourcing, a popular and crucially important, but severely understudied class of crowdsourcing---the next frontier in crowdsourced data management. The underlying challenges include distilling the right answer when none of the workers agree with each other, teasing apart the various perspectives adopted by workers when answering tasks, and effectively selecting between the many open-ended operators appropriate for a problem. We describe the approaches that we've found to be effective for open-ended crowdsourcing, drawing from our experiences in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data
