Does Putting a Linguist in the Loop Improve NLU Data Collection?

Alicia Parrish; William Huang; Omar Agha; Soo-Hwan Lee; Nikita Nangia,; Alex Warstadt; Karmanya Aggarwal; Emily Allaway; Tal Linzen; Samuel R.; Bowman

arXiv:2104.07179·cs.CL·April 16, 2021

Does Putting a Linguist in the Loop Improve NLU Data Collection?

Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia,, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R., Bowman

PDF

Open Access 9 Models 1 Datasets

TL;DR

Involving linguists during crowdsourced NLU data collection helps create more challenging datasets and allows for dynamic gap mitigation, but does not necessarily improve out-of-domain model performance.

Contribution

This study demonstrates that real-time expert involvement during data collection enhances dataset quality and challenge level, introducing a novel iterative, linguist-in-the-loop protocol.

Findings

01

Linguist-in-the-loop datasets are more reliably challenging.

02

No significant improvement in out-of-domain performance with linguist involvement.

03

Chatroom interaction between linguists and crowdworkers has no measurable effect.

Abstract

Many crowdsourced NLP datasets contain systematic gaps and biases that are identified only after data collection is complete. Identifying these issues from early data samples during crowdsourcing should make mitigation more efficient, especially when done iteratively. We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task. We directly compare three data collection protocols: (i) a baseline protocol, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the task, and (iii) an extension of linguist-in-the-loop that provides direct interaction between linguists and crowdworkers via a chatroom. The datasets collected with linguist involvement are more reliably challenging than baseline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

MoritzLaurer/multilingual-NLI-26lang-2mil7
dataset· 889 dl
889 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Mobile Crowdsensing and Crowdsourcing