TL;DR
This paper investigates how grammatical gender influences the measurement of social gender bias in word embeddings of gendered languages, proposing methods to disentangle these signals for more accurate bias assessment.
Contribution
It introduces post-processing techniques to separate grammatical gender signals from semantic gender information in word embeddings of multiple languages.
Findings
Disentangling grammatical gender reduces bias effect size significantly.
Over 90% of inanimate nouns show weakened gender associations after disentangling.
Cross-lingual bias results align better with country-level implicit bias measurements.
Abstract
Does the grammatical gender of a language interfere when measuring the semantic gender information captured by its word embeddings? A number of anomalous gender bias measurements in the embeddings of gendered languages suggest this possibility. We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages, which can skew social gender bias measurements. Consequently, word embedding post-processing methods are introduced to quantify, disentangle, and evaluate grammatical gender signals. The evaluation is performed on five gendered languages from the Germanic, Romance, and Slavic branches of the Indo-European language family. Our method reduces the strength of grammatical gender signals, which is measured in terms of effect size (Cohen's d), by a significant average of d = 1.3 for French, German, and Italian, and d…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
