Semantics derived automatically from language corpora contain human-like biases
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan

TL;DR
This paper demonstrates that standard machine learning models trained on everyday language data automatically acquire human-like biases, reflecting societal prejudices and stereotypes present in the language corpus.
Contribution
It shows for the first time that common word embedding models encode human-like biases and introduces new methods for measuring bias in text data.
Findings
Word embeddings replicate human biases in race, gender, and morality.
Language contains recoverable societal biases reflected in machine learning models.
New bias evaluation methods (WEAT and WEFAT) effectively measure biases in text.
Abstract
Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGloVe Embeddings
