Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert, Wolfe, Mahzarin R. Banaji

TL;DR
This paper provides a comprehensive analysis of gender biases in English word embeddings, revealing prevalent stereotypes related to frequency, syntax, semantics, and emotional attributes, highlighting the masculine default in language.
Contribution
It offers a detailed, multi-faceted examination of gender biases in popular static word embeddings, including frequency, part-of-speech, semantic categories, and emotional dimensions.
Findings
77% of the most frequent words are more associated with men
Male-associated words are typically verbs, female-associated words are adjectives and adverbs
Male words score higher on arousal and dominance, female words higher on valence
Abstract
The statistical regularities in language corpora encode well-known social biases into word embeddings. Here, we focus on gender to provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora (GloVe 2014, fastText 2017). Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases that also show differences in: (1) frequencies of words associated with men versus women; (b) part-of-speech tags in gender-associated words; (c) semantic categories in gender-associated words; and (d) valence, arousal, and dominance in gender-associated words. First, in terms of word frequency: we find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women, providing direct evidence of a masculine default in the everyday language of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText
