Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents

Xuan Liu; HaoYang Shang; Zizhang Liu; Yuanjun Feng; Guankai Zhai; Yunze Xiao; Yiwen Tu; Haojian Jin

arXiv:2605.15473·cs.CY·May 18, 2026

Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents

Xuan Liu, HaoYang Shang, Zizhang Liu, Yuanjun Feng, Guankai Zhai, Yunze Xiao, Yiwen Tu, Haojian Jin

PDF

TL;DR

This paper introduces a novel evaluation framework using validated social science hypotheses to assess how human-like AI agents are, emphasizing objective, scalable, and replicable measures of human-likeness.

Contribution

The authors present HumanStudy-Bench, an open platform that converts social science studies into simulation environments for evaluating AI agents' human-likeness.

Findings

01

Agent responses vary from full replication to failure.

02

Agent design impacts alignment more than model size.

03

Alignment effects are non-monotonic.

Abstract

We propose using validated behavioral hypotheses as a lens for evaluating human-likeness in LLM-based agents. Our key idea is simple: If an agent is human-like, a population of such agents should reach the same inferential conclusion as the human population when run through the same experiment. Decades of social science have produced many such validated findings, each anchored to concrete experimental protocols and robustly established through independent replication. This yields an evaluation that is objective, decomposable, and scalable. We operationalize this lens through HumanStudy-Bench, an open platform that turns published human-subject studies into reusable simulation environments and administers the evaluation to configurable agents. It scores agent-human alignment on two metrics: the Probability Alignment Score (PAS) for inferential agreement and the Effect Consistency Score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.