Towards Fairness Assessment of Dutch Hate Speech Detection
Julie Bauer, Rishabh Kaushal, Thales Bertaglia, Adriana Iamnitchi

TL;DR
This paper evaluates the fairness of Dutch hate speech detection models, using counterfactual data generation and fairness metrics to identify challenges and suggest improvements for model fairness and performance.
Contribution
It introduces a Dutch social group term list, generates counterfactual data with LLMs, and assesses transformer models' fairness, addressing a gap in Dutch hate speech detection research.
Findings
Models perform better on hate speech detection and fairness metrics.
Counterfactual data generation faces challenges with Dutch grammar.
Fairness improvements are achievable with counterfactual training.
Abstract
Numerous studies have proposed computational methods to detect hate speech online, yet most focus on the English language and emphasize model development. In this study, we evaluate the counterfactual fairness of hate speech detection models in the Dutch language, specifically examining the performance and fairness of transformer-based models. We make the following key contributions. First, we curate a list of Dutch Social Group Terms that reflect social context. Second, we generate counterfactual data for Dutch hate speech using LLMs and established strategies like Manual Group Substitution (MGS) and Sentence Log-Likelihood (SLL). Through qualitative evaluation, we highlight the challenges of generating realistic counterfactuals, particularly with Dutch grammar and contextual coherence. Third, we fine-tune baseline transformer-based models with counterfactual data and evaluate their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsFocus
