HumanStudy-Bench: Towards AI Agent Design for Participant Simulation
Xuan Liu, Haoyang Shang, Zizhang Liu, Xinyan Liu, Yunze Xiao, Yiwen Tu, Haojian Jin

TL;DR
This paper introduces HUMANSTUDY-BENCH, a benchmark and framework for designing and evaluating AI agents that simulate human participants in social science experiments, aiming to improve fidelity and reproducibility.
Contribution
It presents a novel agent-design framework and a comprehensive benchmark for reconstructing and evaluating human-subject experiments using LLM-based agents.
Findings
Successfully instantiated 12 foundational studies with over 6,000 trials.
Developed new metrics to quantify agreement between human and agent behaviors.
Reproduced original statistical procedures end-to-end in a shared runtime.
Abstract
Large language models (LLMs) are increasingly used as simulated participants in social science experiments, but their behavior is often unstable and highly sensitive to design choices. Prior evaluations frequently conflate base-model capabilities with experimental instantiation, obscuring whether outcomes reflect the model itself or the agent setup. We instead frame participant simulation as an agent-design problem over full experimental protocols, where an agent is defined by a base model and a specification (e.g., participant attributes) that encodes behavioral assumptions. We introduce HUMANSTUDY-BENCH, a benchmark and execution engine that orchestrates LLM-based agents to reconstruct published human-subject experiments via a Filter--Extract--Execute--Evaluate pipeline, replaying trial sequences and running the original analysis pipeline in a shared runtime that preserves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Explainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing
