How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation
Yang Xiao, Yi Cheng, Jinlan Fu, Jiashuo Wang, Wenjie Li, Pengfei Liu

TL;DR
This paper introduces SimulateBench, a systematic benchmark for evaluating the believability of LLMs in simulating human behaviors, focusing on consistency and robustness across diverse character profiles.
Contribution
The work presents a novel benchmark, SimulateBench, for assessing LLMs' ability to simulate human behaviors convincingly and reliably, addressing a gap in systematic evaluation methods.
Findings
Current LLMs often fail to maintain character consistency.
LLMs show vulnerability to behavioral perturbations.
Performance varies significantly across different models.
Abstract
In recent years, AI has demonstrated remarkable capabilities in simulating human behaviors, particularly those implemented with large language models (LLMs). However, due to the lack of systematic evaluation of LLMs' simulated behaviors, the believability of LLMs among humans remains ambiguous, i.e., it is unclear which behaviors of LLMs are convincingly human-like and which need further improvements. In this work, we design SimulateBench to evaluate the believability of LLMs when simulating human behaviors. In specific, we evaluate the believability of LLMs based on two critical dimensions: 1) consistency: the extent to which LLMs can behave consistently with the given information of a human to simulate; and 2) robustness: the ability of LLMs' simulated behaviors to remain robust when faced with perturbations. SimulateBench includes 65 character profiles and a total of 8,400 questions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Explainable Artificial Intelligence (XAI)
MethodsALIGN
