PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior
James Flemings, Murali Annavaram

TL;DR
PrivacySIM evaluates how well large language models can simulate individual user privacy decisions using a benchmark of 1,000 users across various privacy-related studies.
Contribution
Introduces PrivacySIM, an evaluation suite for benchmarking LLMs in simulating individual privacy behavior based on user personas.
Findings
Conditioning on privacy personas improves simulation accuracy.
Even the best models only achieve 40.4% accuracy in simulating privacy decisions.
Stated privacy attitudes often diverge from actual user behavior.
Abstract
Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
