How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments
Angela Sch\"opke-Gonzalez, Siqi Wu, Sagar Kumar, Paul J. Resnick,, Libby Hemphill

TL;DR
This study investigates how the definitions of harm like 'hateful', 'offensive', and 'toxic' influence annotation outcomes, emphasizing the importance of precise harm concept definitions in content moderation datasets.
Contribution
The paper provides empirical evidence that harm concepts are not interchangeable and highlights the need for clear, context-specific definitions in annotation tasks for harmful content detection.
Findings
Annotators distinguish between 'hateful', 'offensive', and 'toxic' rather than treating them as interchangeable.
Features of harm definitions and annotator characteristics influence annotation differences.
Researchers should specify harm concepts clearly and consider context-specific definitions in content moderation datasets.
Abstract
Computational social science research has made advances in machine learning and natural language processing that support content moderators in detecting harmful content. These advances often rely on training datasets annotated by crowdworkers for harmful content. In designing instructions for annotation tasks to generate training data for these algorithms, researchers often treat the harm concepts that we train algorithms to detect - 'hateful', 'offensive', 'toxic', 'racist', 'sexist', etc. - as interchangeable. In this work, we studied whether the way that researchers define 'harm' affects annotation outcomes. Using Venn diagrams, information gain comparisons, and content analyses, we reveal that annotators do not use the concepts 'hateful', 'offensive', and 'toxic' interchangeably. We identify that features of harm definitions and annotators' individual characteristics explain much of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
