Large language models are not about natural language
Johan J. Bolhuis, Andrea Moro, Stephen Crain, Sandiway Fong

TL;DR
This paper argues that large language models are ineffective for linguistic analysis because they rely on data-driven probabilistic methods, unlike humans who use internal recursive structures for language understanding.
Contribution
It highlights the fundamental differences between data-driven language models and human linguistic cognition, emphasizing the limitations of large language models for linguistic insights.
Findings
Large language models depend heavily on external data.
Humans use internal recursive systems for language.
Language models cannot distinguish impossible languages.
Abstract
Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational system that recursively generates hierarchical thought structures. The language system grows with minimal external input and can readily distinguish between real language and impossible languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Machine Learning and Algorithms
