Understanding Undesirable Word Embedding Associations

Kawin Ethayarajh; David Duvenaud; and Graeme Hirst

arXiv:1908.06361·cs.CL·August 20, 2019·5 cites

Understanding Undesirable Word Embedding Associations

Kawin Ethayarajh, David Duvenaud, and Graeme Hirst

PDF

Open Access

TL;DR

This paper investigates biases in word embeddings, showing that certain debiasing methods are equivalent to training on unbiased data, and introduces a new bias measure revealing that some models amplify stereotypes.

Contribution

It proves the equivalence of post hoc debiasing to unbiased training and introduces RIPA, a new measure for assessing word embedding bias.

Findings

01

Debiasing via subspace projection is theoretically equivalent to unbiased training under certain conditions.

02

WEAT overestimates bias systematically.

03

SGNS amplifies gender stereotypes for gender-stereotyped words.

Abstract

Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes. However, methods for measuring and removing such biases remain poorly understood. We show that for any embedding model that implicitly does matrix factorization, debiasing vectors post hoc using subspace projection (Bolukbasi et al., 2016) is, under certain conditions, equivalent to training on an unbiased corpus. We also prove that WEAT, the most common association test for word embeddings, systematically overestimates bias. Given that the subspace projection method is provably effective, we use it to derive a new measure of association called the $relational inner product association$ (RIPA). Experiments with RIPA reveal that, on average, skipgram with negative sampling (SGNS) does not make most words any more gendered than they are in the training corpus. However, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection