Large language models that replace human participants can harmfully   misportray and flatten identity groups

Angelina Wang; Jamie Morgenstern; John P. Dickerson

arXiv:2402.01908·cs.CY·February 4, 2025·20 cites

Large language models that replace human participants can harmfully misportray and flatten identity groups

Angelina Wang, Jamie Morgenstern, John P. Dickerson

PDF

Open Access

TL;DR

This paper critically examines the limitations of large language models in accurately representing social identities, highlighting potential harms when used as replacements for human participants in social science research.

Contribution

It analytically and empirically demonstrates that current LLMs tend to misportray and flatten demographic identities, raising concerns about their use in socially sensitive applications.

Findings

01

LLMs often misrepresent demographic groups

02

Empirical evidence from 3200 participants across 16 identities

03

Inference techniques can reduce, but not eliminate, harms

Abstract

Large language models (LLMs) are increasing in capability and popularity, propelling their application in new domains -- including as replacements for human participants in computational social science, user testing, annotation tasks, and more. In many settings, researchers seek to distribute their surveys to a sample of participants that are representative of the underlying human population of interest. This means in order to be a suitable replacement, LLMs will need to be able to capture the influence of positionality (i.e., relevance of social identities like gender and race). However, we show that there are two inherent limitations in the way current LLMs are trained that prevent this. We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups, then empirically show this on 4 LLMs through a series of human studies with 3200…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques