TL;DR
This paper quantifies intersectional biases in language models by analyzing valence associations of social groups, revealing significant biases especially against gender identity, social class, and sexual orientation, and highlights the bias amplification in larger models.
Contribution
It introduces a novel concept projection method to measure intersectional biases in contextualized embeddings, advancing bias detection in language models.
Findings
Language models show strong biases against gender identity, social class, and sexual orientation.
Larger, better-performing models tend to exhibit more bias.
The proposed method outperforms existing evaluation techniques on valence tasks.
Abstract
Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
