Psychometric Comparability of LLM-Based Digital Twins
Yufei Zhang, Zhihao Ma

TL;DR
This study evaluates the psychometric validity of using large language models as digital twins of humans, revealing strengths in population accuracy but systematic divergences in detailed psychometric properties.
Contribution
It introduces a construct-validity framework for benchmarking LLM-based digital twins against human standards across various tasks and features, highlighting their limitations and potential.
Findings
Digital twins achieve high population-level accuracy.
Strong within-participant profile correlations observed.
Item-level correlations are attenuated.
Abstract
Large language models (LLMs) are used as "digital twins" to replace human respondents, yet their psychometric comparability to humans is uncertain. We propose a construct-validity framework spanning construct representation and the nomological net, benchmarking digital twins against human gold standards across models, tasks and testing how person-specific inputs shape performance. Across studies, digital twins achieved high population-level accuracy and strong within-participant profile correlations, alongside attenuated item-level correlations. In word association tests, LLM-based networks show small-world structure and theory-consistent communities similar to humans, yet diverge lexically and in local structure. In decision-making and contextualized tasks, digital twins under-reproduce heuristic biases, showing normative rationality, compressed variance and limited sensitivity to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Personality Traits and Psychology · Digital Mental Health Interventions
