Using Large Language Models to Simulate Multiple Humans and Replicate   Human Subject Studies

Gati Aher; Rosa I. Arriaga; Adam Tauman Kalai

arXiv:2208.10264·cs.CL·July 11, 2023·124 cites

Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies

Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper introduces Turing Experiments to evaluate how well large language models can simulate human behavior across various psychological and economic studies, revealing both capabilities and distortions.

Contribution

It proposes a novel testing framework for assessing language models' ability to replicate human behaviors in research settings, highlighting strengths and limitations.

Findings

01

Models replicate classic experiments like Ultimatum Game and Milgram Shock.

02

Identifies a hyper-accuracy distortion in some models affecting applications.

03

Demonstrates the utility of Turing Experiments for behavioral simulation evaluation.

Abstract

We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
recursivelabsai/model-evaluation-infrastructure
model

Videos

Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Misinformation and Its Impacts

MethodsMulti-Head Attention · Attention Is All You Need · Discriminative Fine-Tuning · GPT · Test · Linear Layer · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Linear Warmup With Cosine Annealing