Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao, Dokyun Lee, Gordon Burtch, Sina Fazelpour

TL;DR
This paper critically evaluates the use of large language models as human surrogates in social science, revealing significant limitations in their reasoning capabilities and cautioning against overreliance for behavioral studies.
Contribution
It provides an empirical assessment of LLMs' reasoning depth using a specific economic game, highlighting their failure to replicate human behavior distributions and emphasizing the need for caution.
Findings
LLMs often fail to match human behavior in economic experiments.
Failures are diverse and depend on input language, roles, and safeguards.
Advanced LLMs do not reliably simulate human reasoning in the tested scenario.
Abstract
Recent studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates or simulations for humans in social science research. However, LLMs differ fundamentally from humans, relying on probabilistic patterns, absent the embodied experiences or survival objectives that shape human cognition. We assess the reasoning depth of LLMs using the 11-20 money request game. Nearly all advanced approaches fail to replicate human behavior distributions across many models. Causes of failure are diverse and unpredictable, relating to input language, roles, and safeguarding. These results advise caution when using LLMs to study human behavior or as surrogates or simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
