Hateful Messages: A Conversational Data Set of Hate Speech produced by   Adolescents on Discord

Jan Fillies; Silvio Peikert; Adrian Paschke

arXiv:2309.01413·cs.CL·September 6, 2023·1 cites

Hateful Messages: A Conversational Data Set of Hate Speech produced by Adolescents on Discord

Jan Fillies, Silvio Peikert, Adrian Paschke

PDF

Open Access

TL;DR

This paper introduces a new annotated dataset of youth language hate speech from Discord, aiming to improve automated classification by addressing biases related to adolescent speech patterns.

Contribution

It provides a modern, anonymized dataset of 88,395 chat messages with hate speech annotations and age labels, focusing on youth language bias in hate speech detection.

Findings

01

6.42% of messages identified as hate speech

02

Average user age under 20 years

03

Dataset enhances understanding of youth language bias

Abstract

With the rise of social media, a rise of hateful content can be observed. Even though the understanding and definitions of hate speech varies, platforms, communities, and legislature all acknowledge the problem. Therefore, adolescents are a new and active group of social media users. The majority of adolescents experience or witness online hate speech. Research in the field of automated hate speech classification has been on the rise and focuses on aspects such as bias, generalizability, and performance. To increase generalizability and performance, it is important to understand biases within the data. This research addresses the bias of youth language within hate speech classification and contributes by providing a modern and anonymized hate speech youth language data set consisting of 88.395 annotated chat messages. The data set consists of publicly available online messages from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Social Media and Politics