The Unseen Targets of Hate -- A Systematic Review of Hateful   Communication Datasets

Zehui Yu; Indira Sen; Dennis Assenmacher; Mattia Samory; Leon; Fr\"ohling; Christina Dahn; Debora Nozza; Claudia Wagner

arXiv:2405.08562·cs.CL·June 14, 2024

The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets

Zehui Yu, Indira Sen, Dennis Assenmacher, Mattia Samory, Leon, Fr\"ohling, Christina Dahn, Debora Nozza, Claudia Wagner

PDF

1 Repo

TL;DR

This systematic review analyzes hate speech datasets to reveal biases in target identity representation, highlighting gaps and positive trends in diversifying hate detection data for machine learning.

Contribution

It provides a comprehensive assessment of dataset biases and diversity in hate speech research, emphasizing the importance of inclusive data for fair ML-based moderation.

Findings

01

Skewed representation of target identities in datasets

02

Mismatch between conceptualized and included targets

03

Positive trend towards diversification of datasets

Abstract

Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet, ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uzeui/HateComm_Review
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.