Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models
Karolina Sta\'nczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan, Cotterell, Isabelle Augenstein

TL;DR
This study investigates gender bias in multilingual language models by analyzing adjective and verb usage around politicians' names across seven languages, revealing language-dependent biases and challenging assumptions about model size and bias.
Contribution
Introduces a multilingual probing method and a large dataset to quantify gender bias towards politicians in various language models, highlighting language-specific biases and size-related findings.
Findings
Bias varies significantly across languages.
Certain words are gender-specificly associated with politicians.
Model size does not correlate with increased bias.
Abstract
Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models' stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGender Politics and Representation · Computational and Text Analysis Methods
