Protected group bias and stereotypes in Large Language Models

Hadas Kotek; David Q. Sun; Zidi Xiu; Margit Bowler; Christopher Klein

arXiv:2403.14727·cs.CY·March 25, 2024·2 cites

Protected group bias and stereotypes in Large Language Models

Hadas Kotek, David Q. Sun, Zidi Xiu, Margit Bowler, Christopher Klein

PDF

Open Access

TL;DR

This paper examines biases in Large Language Models related to protected groups, revealing societal biases and amplification effects, and discusses the implications of constraining harmful outputs.

Contribution

It provides a comprehensive analysis of protected group biases in LLMs through human-annotated experiments and highlights the nuanced effects of safety constraints.

Findings

01

Biases found in gender, sexuality, and Western domains

02

Model amplifies societal stereotypes

03

Overly cautious responses may cause harm

Abstract

As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods