Listening to Affected Communities to Define Extreme Speech: Dataset and   Experiments

Antonis Maronikolakis; Axel Wisiorek; Leah Nann; Haris Jabbar; Sahana; Udupa; Hinrich Schuetze

arXiv:2203.11764·cs.CL·March 23, 2022·1 cites

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Antonis Maronikolakis, Axel Wisiorek, Leah Nann, Haris Jabbar, Sahana, Udupa, Hinrich Schuetze

PDF

Open Access 1 Repo

TL;DR

This paper introduces XTREMESPEECH, a multilingual hate speech dataset created with direct input from affected communities, aiming to improve hate speech detection and understanding across diverse cultural contexts.

Contribution

It presents a new dataset involving affected communities in data collection, along with novel tasks, baselines, and analysis of cross-country transferability and model interpretability.

Findings

01

Cross-country training is generally ineffective due to cultural differences.

02

The dataset better captures community-specific hate speech.

03

Interpretability analysis sheds light on model decision processes.

Abstract

Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data - as opposed to giving companies and governments control over defining and combatting hate speech. This inclusive approach results in datasets more representative of actually occurring online speech and is likely to facilitate the removal of the social media content that marginalized communities view as causing the most harm. Based on XTREMESPEECH, we establish novel tasks with accompanying baselines, provide evidence that cross-country training is generally not feasible due to cultural differences between countries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

antmarakis/xtremespeech
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Social Media and Politics