How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation
Rui Li, Heming Xia, Xinfeng Yuan, Qingxiu Dong, Lei Sha, Wenjie Li, Zhifang Sui

TL;DR
This paper introduces BehaviorChain, a new benchmark to evaluate how well large language models can simulate continuous human behavior based on personas, revealing current models' limitations in this task.
Contribution
The paper presents the first benchmark for persona-based behavior chain simulation, including a large dataset and evaluation framework for assessing LLMs' human behavior simulation capabilities.
Findings
State-of-the-art models struggle with accurate behavior simulation.
BehaviorChain contains 15,846 behaviors across 1,001 personas.
Current LLMs have significant room for improvement in behavior continuity.
Abstract
Recently, LLMs have garnered increasing attention across academic disciplines for their potential as human digital twins, virtual proxies designed to replicate individuals and autonomously perform tasks such as decision-making, problem-solving, and reasoning on their behalf. However, current evaluations of LLMs primarily emphasize dialogue simulation while overlooking human behavior simulation, which is crucial for digital twins. To address this gap, we introduce BehaviorChain, the first benchmark for evaluating LLMs' ability to simulate continuous human behavior. BehaviorChain comprises diverse, high-quality, persona-based behavior chains, totaling 15,846 distinct behaviors across 1,001 unique personas, each with detailed history and profile metadata. For evaluation, we integrate persona metadata into LLMs and employ them to iteratively infer contextually appropriate behaviors within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Technology Use by Older Adults · Innovation, Technology, and Society
MethodsSoftmax · Attention Is All You Need
