Hate Speech Detection with Generalizable Target-aware Fairness

Tong Chen; Danny Wang; Xurong Liang; Marten Risius; Gianluca; Demartini; Hongzhi Yin

arXiv:2406.00046·cs.CL·June 12, 2024

Hate Speech Detection with Generalizable Target-aware Fairness

Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca, Demartini, Hongzhi Yin

PDF

Open Access 1 Repo

TL;DR

This paper introduces GetFair, a novel method for hate speech detection that maintains fairness across known and unseen targeted groups by adversarially removing target-related biases using a hypernetwork-based filtering approach.

Contribution

GetFair is the first approach to generalize target-aware fairness in hate speech detection to unseen groups using a hypernetwork to generate adaptive filters.

Findings

01

GetFair outperforms existing methods on out-of-sample targets.

02

The hypernetwork effectively generates target-specific filters.

03

GetFair maintains fairness and accuracy across diverse and unseen groups.

Abstract

To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xurong-liang/GetFair
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsHyperNetwork