# Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test

**Authors:** Nikoleta Pantelidou, Evelina Leivada, Raquel Montero, Paolo Morosi, Wei Lun Wong, Wei Lun Wong, Wei Lun Wong

PMC · DOI: 10.1371/journal.pone.0343164 · 2026-03-11

## TL;DR

This study finds that the accuracy of large language models in a word generalization task is more influenced by the size of the language community than by grammatical complexity.

## Contribution

The study introduces a multilingual Wug Test to compare model performance with human speakers across four languages.

## Key findings

- Model accuracy in morphological generalization aligns more with community size and data availability than with grammatical complexity.
- Languages with larger speaker communities, like Spanish and English, showed higher model accuracy.
- Model behavior resembles human linguistic competence superficially but is driven by resource richness.

## Abstract

The linguistic abilities of Large Language Models are a matter of ongoing debate. This study contributes to this discussion by investigating model performance in a morphological generalization task that involves novel words. Using a multilingual adaptation of the Wug Test, six models were tested across four partially unrelated languages (Catalan, English, Greek, and Spanish) and compared with human speakers. The aim is to determine whether model accuracy approximates human competence and whether it is shaped primarily by linguistic complexity or by the size of the linguistic community, which affects the quantity of available training data. Consistent with previous research, the results show that the models are able to generalize morphological processes to unseen words with human-like accuracy. However, accuracy patterns align more closely with community size and data availability than with structural complexity, refining earlier claims in the literature. In particular, languages with larger speaker communities and stronger digital representation, such as Spanish and English, revealed higher accuracy than less-resourced ones like Catalan and Greek. Overall, our findings suggest that model behavior is mainly driven by the richness of linguistic resources rather than by sensitivity to grammatical complexity, reflecting a form of performance that resembles human linguistic competence only superficially.

## Full-text entities

- **Diseases:** cognitive, neurological, hearing, or speech-related impairments (MESH:D060825), cognitive fatigue (MESH:D005221), LLMs (MESH:D007806), attention lapses (MESH:D001289)
- **Chemicals:** BERT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978473/full.md

---
Source: https://tomesphere.com/paper/PMC12978473