Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans Due to Impenetrable Semantic Reference

Vittoria Dentella; Fritz Guenther; Evelina Leivada

arXiv:2404.14883·cs.CL·June 30, 2025·2 cites

Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans Due to Impenetrable Semantic Reference

Vittoria Dentella, Fritz Guenther, Evelina Leivada

PDF

Open Access

TL;DR

This study compares large language models and humans on grammaticality judgments, revealing that larger models outperform humans in some areas but still lack human-like language understanding, especially regarding semantics and grammatical sensitivity.

Contribution

It demonstrates that increasing model size alone does not equate to human-like language comprehension, highlighting fundamental differences in semantic processing.

Findings

01

ChatGPT-4 outperforms humans in grammatical accuracy

02

Humans are less accurate overall but more stable in answers

03

Scaling models does not fully bridge the comprehension gap

Abstract

Understanding the limits of language is a prerequisite for Large Language Models (LLMs) to act as theories of natural language. LLM performance in some language tasks presents both quantitative and qualitative differences from that of humans, however it remains to be determined whether such differences are amenable to model size. This work investigates the critical role of model scaling, determining whether increases in size make up for such differences between humans and models. We test three LLMs from different families (Bard, 137 billion parameters; ChatGPT-3.5, 175 billion; ChatGPT-4, 1.5 trillion) on a grammaticality judgment task featuring anaphora, center embedding, comparatives, and negative polarity. N=1,200 judgments are collected and scored for accuracy, stability, and improvements in accuracy upon repeated presentation of a prompt. Results of the best performing LLM,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage Development and Disorders · Language and cultural evolution