Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types
David Rozado

TL;DR
This study conducts a large-scale analysis of sentiment biases in word embeddings across various social dimensions, revealing underreported bias types and highlighting the complexity and heterogeneity of algorithmic bias.
Contribution
It introduces a comprehensive screening method using large sentiment lexicons to identify diverse and underreported biases in popular word embedding models.
Findings
Systemic bias against African-American names in most models
Gender bias in embeddings is multifaceted and sometimes reversed
Novel biases against socioeconomic status, age, appearance, religion, and politics
Abstract
This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
