Virtual Personas for Language Models via an Anthology of Backstories

Suhong Moon; Marwa Abdulhai; Minwoo Kang; Joseph Suh; Widyadewi Soedarmadji; Eran Kohen Behar; David M. Chan; John Canny

arXiv:2407.06576·cs.CL·May 12, 2026·3 cites

Virtual Personas for Language Models via an Anthology of Backstories

Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan, John Canny

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces 'Anthology', a method to condition large language models on virtual personas using open-ended backstories, improving response consistency and diversity representation in behavioral studies.

Contribution

The paper presents a novel approach to steer LLM responses towards specific personas using life narratives, enhancing experimental reliability and diversity coverage.

Findings

01

Up to 18% improvement in response distribution matching.

02

Up to 27% enhancement in response consistency.

03

Effective across three nationally representative surveys.

Abstract

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cannylab/anthology
github

Datasets

SuhongMoon/anthology_backstory
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.