It's All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution
Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Simone Teufel

TL;DR
This paper compares gender bias mitigation methods in word embeddings, introducing name-based counterfactual data substitution and the Names Intervention to effectively reduce both direct and indirect gender bias.
Contribution
It proposes novel name-based counterfactual data substitution techniques that outperform existing projection-based methods in reducing gender bias in word embeddings.
Findings
CDA variants outperform projection methods in non-biased gender analogy tasks by 19%.
CDA/S with Names Intervention reduces gender bias clustering by 49%.
The proposed methods effectively mitigate both direct and indirect gender bias.
Abstract
This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias and perform well in tasks which quantify embedding quality, CDA variants outperform projection-based methods at the task of drawing non-biased gender analogies by an average of 19% across both corpora. We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
