How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text   Classification

Ewoenam Tokpo; Pieter Delobelle; Bettina Berendt; Toon Calders

arXiv:2301.12855·cs.CL·January 31, 2023·1 cites

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

PDF

Open Access 1 Repo

TL;DR

This paper investigates how intrinsic gender bias mitigation strategies in language models affect downstream text classification fairness, revealing that these strategies often hide bias rather than eliminate it and are insufficient alone.

Contribution

The study provides a probe to evaluate the impact of intrinsic bias mitigation on downstream tasks and demonstrates their limited effectiveness in reducing extrinsic bias.

Findings

01

Intrinsic mitigation techniques often hide gender bias without removing it.

02

Mitigation strategies can fool some bias metrics but not all.

03

Intrinsic techniques alone do not consistently reduce extrinsic bias.

Abstract

To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of this. In this work, we design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover that instead of resolving gender bias, intrinsic mitigation techniques and metrics are able to hide it in such a way that significant gender information is retained in the embeddings. Furthermore, we show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ewoet/intrinsic-gender-probe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI · Text Readability and Simplification

MethodsNone