TL;DR
This paper introduces methods to measure and reduce biases in word embeddings, improving the validity of inferences in NLP models, especially addressing gender bias in static and contextualized embeddings.
Contribution
It proposes a bias measurement mechanism using natural language inference and demonstrates bias mitigation techniques for static and contextualized embeddings.
Findings
Bias measurement reduces invalid inferences
Bias mitigation techniques are effective on GloVe embeddings
Selective application extends to ELMo and BERT
Abstract
Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied selectively only to the static components of contextualized embeddings (ELMo, BERT).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
