Quantifying the Persona Effect in LLM Simulations
Tiancheng Hu, Nigel Collier

TL;DR
This paper examines how persona variables influence large language models' ability to simulate human perspectives, finding modest improvements with prompting and a linear relationship between persona relevance and prediction accuracy.
Contribution
It quantifies the impact of persona variables on LLM simulations and demonstrates how prompting can enhance performance in subjective NLP tasks.
Findings
Persona variables explain less than 10% of annotation variance.
Prompting with personas improves LLM predictions modestly and significantly.
A 70b model with persona prompting captures 81% of achievable annotation variance.
Abstract
Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman-Automation Interaction and Safety
MethodsLinear Regression
