When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei, David Jurgens

TL;DR
This paper introduces the POPQUORN dataset, demonstrating that annotator demographics significantly influence labeling decisions in NLP tasks, highlighting the importance of considering background diversity to reduce dataset bias.
Contribution
The study presents a large, demographically representative dataset and analyzes how various background factors affect annotation, emphasizing the need to account for annotator diversity in NLP.
Findings
Annotator demographics significantly influence labeling decisions.
Background factors like education impact annotations.
Demographically balanced data collection reduces bias.
Abstract
Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to how they label data. However, NLP has only recently considered how annotator identity might influence their decisions. Here, we present POPQUORN (the POtato-Prolific dataset for QUestion-Answering, Offensiveness, text Rewriting, and politeness rating with demographic Nuance). POPQUORN contains 45,000 annotations from 1,484 annotators, drawn from a representative sample regarding sex, age, and race as the US population. Through a series of analyses, we show that annotators' background plays a significant role in their judgments. Further, our work shows that backgrounds not previously considered in NLP (e.g., education), are meaningful and should be considered. Our study suggests that understanding the background of annotators and collecting labels from a demographically balanced pool of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Topic Modeling
