Evaluating Biased Attitude Associations of Language Models in an   Intersectional Context

Shiva Omrani Sabbaghi; Robert Wolfe; Aylin Caliskan

arXiv:2307.03360·cs.CY·July 10, 2023

Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

Shiva Omrani Sabbaghi, Robert Wolfe, Aylin Caliskan

PDF

1 Repo

TL;DR

This paper quantifies intersectional biases in language models by analyzing valence associations of social groups, revealing significant biases especially against gender identity, social class, and sexual orientation, and highlights the bias amplification in larger models.

Contribution

It introduces a novel concept projection method to measure intersectional biases in contextualized embeddings, advancing bias detection in language models.

Findings

01

Language models show strong biases against gender identity, social class, and sexual orientation.

02

Larger, better-performing models tend to exhibit more bias.

03

The proposed method outperforms existing evaluation techniques on valence tasks.

Abstract

Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivaomrani/llm-bias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.