How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective
Chengpiao Huang, Yuhang Wu, Kaizheng Wang

TL;DR
This paper introduces a framework for converting large language model (LLM) simulated survey responses into reliable confidence sets for human population parameters, addressing uncertainty and misalignment issues.
Contribution
It proposes an adaptive, data-driven method to select the number of simulated responses, ensuring accurate coverage and quantifying LLM simulation fidelity.
Findings
Adaptive sample size selection improves confidence set reliability.
The method reveals varying LLM fidelity across domains.
The selected sample size indicates the effective human population size.
Abstract
Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Methodology and Nonresponse · Human Mobility and Location-Based Analysis · demographic modeling and climate adaptation
