How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Wayne Gao; Sukjin Han; Annie Liang

arXiv:2601.12343·econ.EM·January 21, 2026

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Wayne Gao, Sukjin Han, Annie Liang

PDF

Open Access

TL;DR

This paper introduces a measure called equivalent sample size to evaluate how much pretrained large language models (LLMs) know about predicting human behavior, comparing their predictive accuracy to models trained on varying amounts of domain-specific data.

Contribution

It proposes a novel measure for assessing LLM knowledge in prediction tasks and develops a statistical inference method based on asymptotic theory for cross-validated errors.

Findings

01

LLMs encode significant information for some economic variables.

02

The predictive value of LLMs varies across different domains.

03

The method provides insights into when LLMs can substitute for domain-specific data.

Abstract

Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Computational and Text Analysis Methods · Text Readability and Simplification