Multilingual Large Language Models do not comprehend all natural languages to equal degrees

Natalia Moskvina; Raquel Montero; Masaya Yoshida; Ferdy Hubers; Paolo Morosi; Walid Irhaymi; Jin Yan; Tamara Serrano; Elena Pagliarini; Fritz G\"unther; Evelina Leivada

arXiv:2602.20065·cs.CL·February 24, 2026

Multilingual Large Language Models do not comprehend all natural languages to equal degrees

Natalia Moskvina, Raquel Montero, Masaya Yoshida, Ferdy Hubers, Paolo Morosi, Walid Irhaymi, Jin Yan, Tamara Serrano, Elena Pagliarini, Fritz G\"unther, Evelina Leivada

PDF

Open Access

TL;DR

This study evaluates the comprehension abilities of multilingual large language models across 12 diverse languages, revealing they perform variably and often better on some low-resource languages than on English, challenging common assumptions.

Contribution

It provides a comprehensive cross-linguistic assessment of LLMs, highlighting their uneven performance and the factors influencing their language comprehension abilities.

Findings

01

LLMs outperform human baselines in some languages

02

English is not always the best-performing language for LLMs

03

Performance varies significantly across language families and resource levels

Abstract

Large Language Models (LLMs) play a critical role in how humans access information. While their core use relies on comprehending written requests, our understanding of this ability is currently limited, because most benchmarks evaluate LLMs in high-resource languages predominantly spoken by Western, Educated, Industrialised, Rich, and Democratic (WEIRD) communities. The default assumption is that English is the best-performing language for LLMs, while smaller, low-resource languages are linked to less reliable outputs, even in multilingual, state-of-the-art models. To track variation in the comprehension abilities of LLMs, we prompt 3 popular models on a language comprehension task across 12 languages, representing the Indo-European, Afro-Asiatic, Turkic, Sino-Tibetan, and Japonic language families. Our results suggest that the models exhibit remarkable linguistic accuracy across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Big Data and Digital Economy