Bias in Text Embedding Models
Vasyl Rakivnenko, Nestor Maslej, Jessica Cervi, Volodymyr Zhukov

TL;DR
This paper investigates gender bias in popular text embedding models, revealing that these models often associate certain professions with gendered terms, with variations across models and prompts, highlighting the need for awareness in business applications.
Contribution
It provides an empirical analysis of gender bias in text embedding models, showing how biases vary across models and professions, and emphasizes the importance of addressing this bias in practical use.
Findings
Models associate nursing and socialite with female terms
Models link CEO and boss with male terms
Bias magnitude varies across models and prompts
Abstract
Text embedding is becoming an increasingly popular AI methodology, especially among businesses, yet the potential of text embedding models to be biased is not well understood. This paper examines the degree to which a selection of popular text embedding models are biased, particularly along gendered dimensions. More specifically, this paper studies the degree to which these models associate a list of given professions with gendered terms. The analysis reveals that text embedding models are prone to gendered biases but in varying ways. Although there are certain inter-model commonalities, for instance, greater association of professions like nurse, homemaker, and socialite with female identifiers, and greater association of professions like CEO, manager, and boss with male identifiers, not all models make the same gendered associations for each occupation. Furthermore, the magnitude and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
