How We Define Harm Impacts Data Annotations: Explaining How Annotators   Distinguish Hateful, Offensive, and Toxic Comments

Angela Sch\"opke-Gonzalez; Siqi Wu; Sagar Kumar; Paul J. Resnick,; Libby Hemphill

arXiv:2309.15827·cs.CL·September 28, 2023·1 cites

How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments

Angela Sch\"opke-Gonzalez, Siqi Wu, Sagar Kumar, Paul J. Resnick,, Libby Hemphill

PDF

Open Access

TL;DR

This study investigates how the definitions of harm like 'hateful', 'offensive', and 'toxic' influence annotation outcomes, emphasizing the importance of precise harm concept definitions in content moderation datasets.

Contribution

The paper provides empirical evidence that harm concepts are not interchangeable and highlights the need for clear, context-specific definitions in annotation tasks for harmful content detection.

Findings

01

Annotators distinguish between 'hateful', 'offensive', and 'toxic' rather than treating them as interchangeable.

02

Features of harm definitions and annotator characteristics influence annotation differences.

03

Researchers should specify harm concepts clearly and consider context-specific definitions in content moderation datasets.

Abstract

Computational social science research has made advances in machine learning and natural language processing that support content moderators in detecting harmful content. These advances often rely on training datasets annotated by crowdworkers for harmful content. In designing instructions for annotation tasks to generate training data for these algorithms, researchers often treat the harm concepts that we train algorithms to detect - 'hateful', 'offensive', 'toxic', 'racist', 'sexist', etc. - as interchangeable. In this work, we studied whether the way that researchers define 'harm' affects annotation outcomes. Using Venn diagrams, information gain comparisons, and content analyses, we reveal that annotators do not use the concepts 'hateful', 'offensive', and 'toxic' interchangeably. We identify that features of harm definitions and annotators' individual characteristics explain much of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning