HateDebias: On the Diversity and Variability of Hate Speech Debiasing

Hongyan Wu; Zhengming Chen; Zijian Li; Nankai Lin; Lianxi Wang; Shengyi Jiang; Aimin Yang

arXiv:2406.04876·cs.CL·August 27, 2025

HateDebias: On the Diversity and Variability of Hate Speech Debiasing

Hongyan Wu, Zhengming Chen, Zijian Li, Nankai Lin, Lianxi Wang, Shengyi Jiang, Aimin Yang

PDF

Open Access

TL;DR

This paper introduces HateDebias, a benchmark for hate speech debiasing that considers diversity and variability of biases, along with a continual debiasing framework to improve fairness in dynamic real-world environments.

Contribution

It presents a new benchmark dataset capturing diverse and evolving biases in hate speech, and proposes a continual debiasing method to address dynamic bias challenges.

Findings

01

Our methods improve bias mitigation in dynamic scenarios.

02

The HateDebias benchmark reveals significant bias-related performance degradation.

03

Continual debiasing enhances fairness in real-world hate speech detection.

Abstract

Hate speech frequently appears on social media platforms and urgently needs to be effectively controlled. Alleviating the bias caused by hate speech can help resolve various ethical issues. Although existing research has constructed several datasets for hate speech detection, these datasets seldom consider the diversity and variability of bias, making them far from real-world scenarios. To fill this gap, we propose a benchmark HateDebias to analyze the fairness of models under dynamically evolving environments. Specifically, to meet the diversity of biases, we collect hate speech data with different types of biases from real-world scenarios. To further simulate the variability in the real-world scenarios(i.e., the changing of bias attributes in datasets), we construct a dataset to follow the continuous learning setting and evaluate the detection accuracy of models on the HateDebias,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Freedom of Expression and Defamation