On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection
Fatma Elsafoury, and Stamos Katsigiannis

TL;DR
This paper investigates how different biases in language models affect toxicity detection fairness and evaluates debiasing techniques, finding that overamplification bias significantly impacts fairness and can be mitigated through targeted fine-tuning.
Contribution
It identifies key sources of bias in NLP models affecting toxicity detection fairness and evaluates debiasing methods, providing practical guidelines for improving fairness.
Findings
Overamplification bias has the strongest impact on toxicity detection fairness.
Fine-tuning on balanced datasets improves fairness.
Debiasing reduces bias effects but varies by bias type.
Abstract
Language models are the new state-of-the-art natural language processing (NLP) models and they are being increasingly used in many NLP tasks. Even though there is evidence that language models are biased, the impact of that bias on the fairness of downstream NLP tasks is still understudied. Furthermore, despite that numerous debiasing methods have been proposed in the literature, the impact of bias removal methods on the fairness of NLP tasks is also understudied. In this work, we investigate three different sources of bias in NLP models, i.e. representation bias, selection bias and overamplification bias, and examine how they impact the fairness of the downstream task of toxicity detection. Moreover, we investigate the impact of removing these biases using different bias removal techniques on the fairness of toxicity detection. Results show strong evidence that downstream sources of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
