Measuring Gender Bias in Word Embeddings of Gendered Languages Requires   Disentangling Grammatical Gender Signals

Shiva Omrani Sabbaghi; Aylin Caliskan

arXiv:2206.01691·cs.CY·June 6, 2022

Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals

Shiva Omrani Sabbaghi, Aylin Caliskan

PDF

1 Repo

TL;DR

This paper investigates how grammatical gender influences the measurement of social gender bias in word embeddings of gendered languages, proposing methods to disentangle these signals for more accurate bias assessment.

Contribution

It introduces post-processing techniques to separate grammatical gender signals from semantic gender information in word embeddings of multiple languages.

Findings

01

Disentangling grammatical gender reduces bias effect size significantly.

02

Over 90% of inanimate nouns show weakened gender associations after disentangling.

03

Cross-lingual bias results align better with country-level implicit bias measurements.

Abstract

Does the grammatical gender of a language interfere when measuring the semantic gender information captured by its word embeddings? A number of anomalous gender bias measurements in the embeddings of gendered languages suggest this possibility. We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages, which can skew social gender bias measurements. Consequently, word embedding post-processing methods are introduced to quantify, disentangle, and evaluate grammatical gender signals. The evaluation is performed on five gendered languages from the Germanic, Romance, and Slavic branches of the Indo-European language family. Our method reduces the strength of grammatical gender signals, which is measured in terms of effect size (Cohen's d), by a significant average of d = 1.3 for French, German, and Italian, and d…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivaomrani/gg_disentangling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.