Do LLMs exhibit human-like response biases? A case study in survey design
Lindia Tjuatja, Valerie Chen, Sherry Tongshuang Wu, Ameet Talwalkar,, Graham Neubig

TL;DR
This study examines whether large language models exhibit human-like response biases in survey design, revealing that most models do not accurately reflect human sensitivities and highlighting limitations in using LLMs as human proxies.
Contribution
The paper introduces a framework and dataset to evaluate human-like response biases in LLMs, revealing their failure to replicate human sensitivities in survey prompts.
Findings
Most LLMs do not exhibit human-like response biases.
Models with RLHF are less sensitive to prompt variations.
LLMs respond to perturbations differently than humans.
Abstract
As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. We investigate the extent to which LLMs reflect human response biases, if at all. We look to survey design, where human response biases caused by changes in the wordings of "prompts" have been extensively explored in social psychology literature. Drawing from these works, we design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPsychology of Social Influence · Survey Methodology and Nonresponse
