Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses
Jens Rupprecht, Georg Ahnert, Markus Strohmaier

TL;DR
This study examines how large language models respond to survey question perturbations, revealing their susceptibility to biases like recency bias and emphasizing the importance of prompt robustness for reliable survey simulations.
Contribution
It systematically tests LLMs on survey questions with various perturbations, uncovering biases and vulnerabilities that impact their use in social science research.
Findings
All models exhibit recency bias, favoring the last answer option.
Larger models are generally more robust to perturbations.
Models remain sensitive to semantic changes like paraphrasing.
Abstract
Large Language Models (LLMs) are increasingly used as proxies for human subjects in social science surveys, but their reliability and susceptibility to known human-like response biases, such as central tendency, opinion floating and primacy bias are poorly understood. This work investigates the response robustness of LLMs in normative survey contexts, we test nine LLMs on questions from the World Values Survey (WVS), applying a comprehensive set of ten perturbations to both question phrasing and answer option structure, resulting in over 167,000 simulated survey interviews. In doing so, we not only reveal LLMs' vulnerabilities to perturbations but also show that all tested models exhibit a consistent recency bias, disproportionately favoring the last-presented answer option. While larger models are generally more robust, all models remain sensitive to semantic variations like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
