Relating Word Embedding Gender Biases to Gender Gaps: A Cross-Cultural Analysis
Scott Friedman, Sonja Schmer-Galunder, Anthony Chen, and Jeffrey Rye

TL;DR
This paper introduces a method to quantify gender bias in word embeddings and uses it to analyze and correlate cultural gender gaps across regions and countries, revealing insights into societal biases.
Contribution
It presents a novel approach linking word embedding biases to real-world gender gaps, enabling cross-cultural analysis of societal biases through big data.
Findings
Word embedding biases correlate with regional gender gaps.
Bias metrics predict gender disparities in education, politics, economics, health.
Cross-cultural differences in biases reflect societal gender norms.
Abstract
Modern models for common NLP tasks often employ machine learning techniques and train on journalistic, social media, or other culturally-derived text. These have recently been scrutinized for racial and gender biases, rooting from inherent bias in their training text. These biases are often sub-optimal and recent work poses methods to rectify them; however, these biases may shed light on actual racial or gender gaps in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach for quantifying gender bias in word embeddings, and then using them to characterize statistical gender gaps in education, politics, economics, and health. We validate these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries. We correlate state and country word embedding biases with 18 international and 5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection
