Large language models are not about natural language

Johan J. Bolhuis; Andrea Moro; Stephen Crain; Sandiway Fong

arXiv:2512.13441·cs.CL·December 17, 2025

Large language models are not about natural language

Johan J. Bolhuis, Andrea Moro, Stephen Crain, Sandiway Fong

PDF

Open Access

TL;DR

This paper argues that large language models are ineffective for linguistic analysis because they rely on data-driven probabilistic methods, unlike humans who use internal recursive structures for language understanding.

Contribution

It highlights the fundamental differences between data-driven language models and human linguistic cognition, emphasizing the limitations of large language models for linguistic insights.

Findings

01

Large language models depend heavily on external data.

02

Humans use internal recursive systems for language.

03

Language models cannot distinguish impossible languages.

Abstract

Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational system that recursively generates hierarchical thought structures. The language system grows with minimal external input and can readily distinguish between real language and impossible languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Machine Learning and Algorithms