GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

Leonor Veloso; Hinrich Sch\"utze

arXiv:2605.12299·cs.CL·May 13, 2026

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

Leonor Veloso, Hinrich Sch\"utze

PDF

TL;DR

GKnow is a benchmark designed to evaluate and analyze the entanglement of gender bias and factual gender knowledge in language models, revealing challenges in debiasing methods and the complexity of gender-related predictions.

Contribution

We introduce GKnow, a new benchmark for assessing gender knowledge and bias, and analyze the neural circuits responsible for gendered predictions in language models.

Findings

01

Gender bias and factual gender are highly entangled at circuit and neuron levels.

02

Neuron ablation does not reliably reduce gender bias without affecting factual gender knowledge.

03

Existing benchmarks may obscure decreases in factual gender understanding after debiasing.

Abstract

Recent works have analyzed the impact of individual components of neural networks on gendered predictions, often with a focus on mitigating gender bias. However, mechanistic interpretations of gender tend to (i) focus on a very specific gender-related task, such as gendered pronoun prediction, or (ii) fail to distinguish between the production of factually gendered outputs (the correct assumption of gender given a word that carries gender as a semantic property) and gender biased outputs (based on a stereotype). To address these issues, we curate \gknow, a benchmark to assess gender knowledge and gender bias in language models across different types of gender-related predictions. \gknow allows us to identify and analyze circuits and individual neurons responsible for gendered predictions. We test the impact of neuron ablation on benchmarks for disentangling stereotypical and factual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.