# One test, many tongues: Surveying language proficiency across the globe

**Authors:** Pol van Rijn, Yue Sun, Harin Lee, Raja Marjieh, Ilia Sucholutsky, Francesca Lanzarini, Elisabeth André, Nori Jacoby

PMC · DOI: 10.1073/pnas.2420179123 · Proceedings of the National Academy of Sciences of the United States of America · 2026-03-27

## TL;DR

This paper introduces a new method to automatically create language proficiency tests for 1,939 languages, validated with 4,137 participants across 34 countries.

## Contribution

A scalable, automated pipeline for generating language proficiency tests in thousands of languages.

## Key findings

- The test can distinguish native, second-language, and non-speakers within one minute.
- Linguistic and demographic factors systematically influence self-reported and actual language skills.
- Vocabulary tests correlate with other language competencies like listening and writing.

## Abstract

Measuring language proficiency is essential for research in many areas, including second language acquisition, psycholinguistics, and cognitive science. We propose a method to derive language proficiency tests from texts and apply it to generate new tests for 1,939 languages. We extensively tested this through experiments with 4,137 participants. We used the method to test the linguistic background of speakers in their first and second language in 34 languages across 34 countries and characterize how our test is influenced by linguistic and demographic factors. Overall, our work provides a complementary tool for assessing global variations in language proficiency, offering an alternative to existing approaches and helping to reduce the field’s overreliance on the English language in the cognitive and social sciences.

Language influences our thinking and affects many aspects of cognition, from how we perceive the world to how we interact socially. Thus, objectively characterizing linguistic background is crucial for research in many areas, including second language acquisition, psycho-linguistics, and cognitive science. Traditional language proficiency tests, however, are manually composed by experts, limiting their scope for both lab and online settings. Here, we propose a pipeline that automatically derives a language proficiency test from a corpus of text and applies it to create new tests for 1,939 languages. Using this approach, we conducted a large-scale survey examining L1 and L2 proficiency across 34 countries, with participants tested on all 34 languages. Drawing from human ratings from 4,137 participants, our results validate that our test can effectively distinguish native speakers, second-language speakers, and nonspeakers within one minute, making it an effective tool for evaluating linguistic proficiency. We show that participants’ linguistic and demographic backgrounds systematically influence both their language proficiency and their self-reported skills, and we map the prevalence of global languages, such as English and Spanish, among online participants. Moreover, we show that our vocabulary tests are strongly correlated with other linguistic competences—such as listening and writing—in a set of typologically varied languages, demonstrating our test is an efficient instrument to assess language proficiency. More broadly, our work offers a significant resource for investigating global variation in language skills and contributes to reducing the overreliance on the English language in the cognitive and social sciences.

## Full-text entities

- **Chemicals:** PNAS (MESH:D020135)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13038065/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13038065/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/PMC13038065/full.md

---
Source: https://tomesphere.com/paper/PMC13038065