TL;DR
This study evaluates various crowdsourcing strategies for collecting challenging natural language understanding data, finding that iterative protocols with expert feedback significantly improve data difficulty compared to other methods.
Contribution
It introduces and tests an iterative crowdsourcing protocol with expert assessments, demonstrating its effectiveness in creating more challenging NLU datasets.
Findings
Iterative protocol with expert feedback produces more challenging data.
Worker explanations alone do not increase data difficulty.
Crowdsourced worker qualification is less effective than expert-based assessment.
Abstract
Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
