On Bias and Fairness in NLP: Investigating the Impact of Bias and   Debiasing in Language Models on the Fairness of Toxicity Detection

Fatma Elsafoury; and Stamos Katsigiannis

arXiv:2305.12829·cs.CL·April 29, 2024·1 cites

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Fatma Elsafoury, and Stamos Katsigiannis

PDF

Open Access

TL;DR

This paper investigates how different biases in language models affect toxicity detection fairness and evaluates debiasing techniques, finding that overamplification bias significantly impacts fairness and can be mitigated through targeted fine-tuning.

Contribution

It identifies key sources of bias in NLP models affecting toxicity detection fairness and evaluates debiasing methods, providing practical guidelines for improving fairness.

Findings

01

Overamplification bias has the strongest impact on toxicity detection fairness.

02

Fine-tuning on balanced datasets improves fairness.

03

Debiasing reduces bias effects but varies by bias type.

Abstract

Language models are the new state-of-the-art natural language processing (NLP) models and they are being increasingly used in many NLP tasks. Even though there is evidence that language models are biased, the impact of that bias on the fairness of downstream NLP tasks is still understudied. Furthermore, despite that numerous debiasing methods have been proposed in the literature, the impact of bias removal methods on the fairness of NLP tasks is also understudied. In this work, we investigate three different sources of bias in NLP models, i.e. representation bias, selection bias and overamplification bias, and examine how they impact the fairness of the downstream task of toxicity detection. Moreover, we investigate the impact of removing these biases using different bias removal techniques on the fairness of toxicity detection. Results show strong evidence that downstream sources of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection