How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?

Yuxuan Li; Leyang Li; Hao-Ping Lee; Sauvik Das

arXiv:2602.18464·cs.CY·February 25, 2026

How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?

Yuxuan Li, Leyang Li, Hao-Ping Lee, Sauvik Das

PDF

Open Access

TL;DR

This paper evaluates how well large language model (LLM) agents can simulate human attitudes and behaviors towards security and privacy threats, revealing significant gaps and potential for improvement in their alignment with real human responses.

Contribution

Introduces SP-ABCBench, a benchmark of 30 tests based on human studies, to measure LLMs' ability to simulate security and privacy attitudes and behaviors.

Findings

01

Models score 50-64 on average, indicating room for improvement.

02

Larger models do not consistently outperform smaller ones.

03

Certain configurations achieve high alignment, e.g., scores above 95 with specific prompting strategies.

Abstract

A growing body of research assumes that large language model (LLM) agents can serve as proxies for how people form attitudes toward and behave in response to security and privacy (S&P) threats. If correct, these simulations could offer a scalable way to forecast S&P risks in products prior to deployment. We interrogate this assumption using SP-ABCBench, a new benchmark of 30 tests derived from validated S&P human-subject studies, which measures alignment between simulations and human-subjects studies on a 0-100 ascending scale, where higher scores indicate better alignment across three dimensions: Attitude, Behavior, and Coherence. Evaluating twelve LLMs, four persona construction strategies, and two prompting methods, we found that there remains substantial room for improvement: all models score between 50 and 64 on average. Newer, bigger, and smarter models do not reliably do better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Digital Mental Health Interventions · Persona Design and Applications