Evaluating Metalinguistic Knowledge in Large Language Models across the World's Languages
Tja\v{s}a Ar\v{c}on (1), Matej Klemen (1), Marko Robnik-\v{S}ikonja (1), Kaja Dobrovoljc (1, 2, 3) ((1) University of Ljubljana, Faculty of Computer, Information Science, Slovenia (2) University of Ljubljana, Faculty of Arts, Slovenia, (3) Jo\v{z}ef Stefan Institute, Ljubljana

TL;DR
This paper evaluates the metalinguistic knowledge of large language models across 2,660 languages using the WALS dataset, revealing limited understanding that is heavily influenced by data availability and digital presence.
Contribution
It introduces a multilingual benchmark based on WALS features to assess LLMs' explicit linguistic knowledge across diverse languages.
Findings
GPT-4o achieves moderate accuracy (0.367)
Models perform above chance but below majority-class baseline
Performance correlates with digital language resources
Abstract
LLMs are routinely evaluated on language use, yet their explicit knowledge about linguistic structure remains poorly understood. Existing linguistic benchmarks focus on narrow phenomena, emphasize high-resource languages, and rarely test metalinguistic knowledge - explicit reasoning about language structure. We present a multilingual evaluation of metalinguistic knowledge in LLMs, based on the World Atlas of Language Structures (WALS), documenting 192 linguistic features across 2,660 languages. We convert WALS features into natural-language multiple-choice questions and evaluate models across documented languages. Using accuracy and macro F1, and comparing to chance and majority-class baselines, we assess performance and analyse variation across linguistic domains and language-related factors. Results show limited metalinguistic knowledge: GPT-4o performs best but achieves moderate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Multilingual Education and Policy
