How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification
Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

TL;DR
This paper investigates how intrinsic gender bias mitigation strategies in language models affect downstream text classification fairness, revealing that these strategies often hide bias rather than eliminate it and are insufficient alone.
Contribution
The study provides a probe to evaluate the impact of intrinsic bias mitigation on downstream tasks and demonstrates their limited effectiveness in reducing extrinsic bias.
Findings
Intrinsic mitigation techniques often hide gender bias without removing it.
Mitigation strategies can fool some bias metrics but not all.
Intrinsic techniques alone do not consistently reduce extrinsic bias.
Abstract
To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of this. In this work, we design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover that instead of resolving gender bias, intrinsic mitigation techniques and metrics are able to hide it in such a way that significant gender information is retained in the embeddings. Furthermore, we show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI · Text Readability and Simplification
MethodsNone
