Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
Franziska Sofia Hafner, Ana Valdivia, Luc Rocher

TL;DR
This paper critically examines how language models encode gender, revealing that they often reinforce binary and biological sex-based stereotypes, which can harm gender diverse identities and calls for broader definitions and mitigation strategies.
Contribution
It introduces an empirical audit guided by gender performativity theory to analyze how models encode gender beyond superficial associations, highlighting the reinforcement of binary and biological sex stereotypes.
Findings
Models tend to encode gender as binary and biologically linked.
Gender non-conforming terms are often erased or pathologized.
Larger models learn stronger gender-sex associations.
Abstract
Language models encode and subsequently perpetuate harmful gendered stereotypes. Research has succeeded in mitigating some of these harms, e.g. by dissociating non-gendered terms such as occupations from gendered terms such as 'woman' and 'man'. This approach, however, remains superficial given that associations are only one form of prejudice through which gendered harms arise. Critical scholarship on gender, such as gender performativity theory, emphasizes how harms often arise from the construction of gender itself, such as conflating gender with biological sex. In language models, these issues could lead to the erasure of transgender and gender diverse identities and cause harms in downstream applications, from misgendering users to misdiagnosing patients based on wrong assumptions about their anatomy. For FAccT research on gendered harms to go beyond superficial linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
