GKnow: Measuring the Entanglement of Gender Bias and Factual Gender
Leonor Veloso, Hinrich Sch\"utze

TL;DR
GKnow is a benchmark designed to evaluate and analyze the entanglement of gender bias and factual gender knowledge in language models, revealing challenges in debiasing methods and the complexity of gender-related predictions.
Contribution
We introduce GKnow, a new benchmark for assessing gender knowledge and bias, and analyze the neural circuits responsible for gendered predictions in language models.
Findings
Gender bias and factual gender are highly entangled at circuit and neuron levels.
Neuron ablation does not reliably reduce gender bias without affecting factual gender knowledge.
Existing benchmarks may obscure decreases in factual gender understanding after debiasing.
Abstract
Recent works have analyzed the impact of individual components of neural networks on gendered predictions, often with a focus on mitigating gender bias. However, mechanistic interpretations of gender tend to (i) focus on a very specific gender-related task, such as gendered pronoun prediction, or (ii) fail to distinguish between the production of factually gendered outputs (the correct assumption of gender given a word that carries gender as a semantic property) and gender biased outputs (based on a stereotype). To address these issues, we curate \gknow, a benchmark to assess gender knowledge and gender bias in language models across different types of gender-related predictions. \gknow allows us to identify and analyze circuits and individual neurons responsible for gendered predictions. We test the impact of neuron ablation on benchmarks for disentangling stereotypical and factual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
