A large-scale evaluation of commonsense knowledge in humans and large language models

Tuan Dung Nguyen; Duncan J. Watts; Mark E. Whiting

arXiv:2505.10309·cs.AI·January 23, 2026

A large-scale evaluation of commonsense knowledge in humans and large language models

Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting

PDF

Open Access

TL;DR

This paper evaluates how well large language models understand commonsense knowledge by comparing their judgments to diverse human opinions, revealing that models often underperform humans and vary in agreement, especially across different model sizes.

Contribution

It introduces a novel evaluation framework that accounts for human heterogeneity, assessing LLMs' commonsense knowledge relative to diverse human populations.

Findings

01

Most LLMs score below human median in commonsense competence.

02

LLMs show only modest correlation with human judgments.

03

Smaller, open models outperform larger, proprietary models in this evaluation.

Abstract

Commonsense knowledge, a major constituent of artificial intelligence (AI), is primarily evaluated in practice by human-prescribed ground-truth labels. An important, albeit implicit, assumption of these labels is that they accurately capture what any human would think, effectively treating human common sense as homogeneous. However, recent empirical work has shown that humans vary enormously in what they consider commonsensical; thus what appears self-evident to one benchmark designer may not be so to another. Here, we propose a method for assessing commonsense knowledge in AI, specifically in large language models (LLMs), that incorporates empirically observed heterogeneity among humans by measuring the correspondence between a model's judgment and that of a human population. We first find that, when treated as independent survey respondents, most LLMs remain below the human median in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training