Stochastic Parrots or Singing in Harmony? Testing Five Leading LLMs for their Ability to Replicate a Human Survey with Synthetic Data
Jason Miklian, Kristian Hoelscher, John E. Katsos

TL;DR
This study compares human survey responses with synthetic data generated by five leading LLMs, revealing that while AI can mimic plausible responses, it fails to capture complex human insights, limiting its use in social research.
Contribution
It provides a systematic evaluation of LLMs' ability to replicate human survey responses, highlighting their limitations and proposing standards for responsible synthetic data use.
Findings
AI models produce plausible but not insightful responses.
Deviations from human data are consistent across models.
Synthetic data is better for identifying societal assumptions than capturing human beliefs.
Abstract
How well can AI-derived synthetic research data replicate the responses of human participants? An emerging literature has begun to engage with this question, which carries deep implications for organizational research practice. This article presents a comparison between a human-respondent survey of 420 Silicon Valley coders and developers and synthetic survey data designed to simulate real survey takers generated by five leading Generative AI Large Language Models: ChatGPT Thinking 5 Pro, Claude Sonnet 4.5 Pro plus Claude CoWork 1.123, Gemini Advanced 2.5 Pro, Incredible 1.0, and DeepSeek 3.2. Our findings reveal that while AI agents produced technically plausible results that lean more towards replicability and harmonization than assumed, none were able to capture the counterintuitive insights that made the human survey valuable. Moreover, deviations grouped together for all models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
