LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park; Carolyn Q. Zou; Jonne Kamphorst; Niles Egan; Aaron Shaw; Benjamin Mako Hill; Carrie Cai; Meredith Ringel Morris; Percy Liang; Robb Willer; Michael S. Bernstein

arXiv:2411.10109·cs.AI·April 23, 2026·35 cites

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park, Carolyn Q. Zou, Jonne Kamphorst, Niles Egan, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Percy Liang, Robb Willer, Michael S. Bernstein

PDF

1 Datasets

TL;DR

This paper demonstrates that large language models, grounded in self-report data, can effectively simulate individual behaviors and traits across various outcomes, surpassing demographic-only baselines.

Contribution

It introduces a method for creating person-specific LLM agents grounded in self-report data, enabling broad individual simulation without task-specific training.

Findings

01

Agents achieved up to 86% accuracy on held-out survey items.

02

Agents predicted personality traits and behaviors with high accuracy.

03

Grounded agents reduced disparities across racial and ideological groups.

Abstract

Machine learning can predict human behavior well when substantial structured data and well-defined outcomes are available, but these models are typically limited to specific outcomes and cannot readily be applied to new domains. We test whether large language models (LLMs) can support a more general-purpose approach by building person-specific simulations (i.e., "generative agents") grounded in self-report data. Using data from a diverse national sample of 1,052 Americans, we build agents from (i) two-hour, semi-structured interviews (elicited using the American Voices Project interview schedule), (ii) structured surveys (the General Social Survey and Big Five personality inventory), or (iii) both sources combined. On held-out General Social Survey items, agent accuracy reached 83% (interview only), 82% (surveys only), and 86% (combined) of participants' two-week test-retest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

implicit-personalization/agentbank_personas
dataset· 80 dl
80 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.