TL;DR
This paper introduces a process-oriented evaluation framework to assess how well large language models simulate human decision-making variability and adaptability in social science tasks, revealing persistent behavioral gaps.
Contribution
It proposes a novel framework with interventions to evaluate LLMs' behavioral fidelity in decision-making, highlighting limitations in replicating human strategic variability.
Findings
LLMs tend to adopt stable, conservative strategies unlike humans.
Risk instructions influence LLM behavior but do not produce human-like diversity.
In-context learning reduces, but does not eliminate, the behavioral gap.
Abstract
Large language models (LLMs) are increasingly used in social science simulations. While their performance on reasoning and optimization tasks has been extensively evaluated, less attention has been paid to their ability to simulate human decision-making's variability and adaptability. We propose a process-oriented evaluation framework with progressive interventions (Intrinsicality, Instruction, and Imitation) to examine how LLM agents adapt under different levels of external guidance and human-derived noise. We validate the framework on two classic economics tasks, irrationality in the second-price auction and decision bias in the newsvendor problem, showing behavioral gaps between LLMs and humans. We find that LLMs, by default, converge on stable and conservative strategies that diverge from observed human behaviors. Risk-framed instructions impact LLM behavior predictably but do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
