The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks
David Pomerenke, Jonas Nothnagel, Simon Ostermann

TL;DR
The paper presents the AI Language Proficiency Monitor, a multilingual benchmark assessing LLMs across 200 languages, focusing on low-resource languages, with an open-source leaderboard and insights to promote transparency and inclusivity in AI development.
Contribution
It introduces a comprehensive, open-source multilingual benchmark and dashboard that evaluates LLMs across diverse tasks and languages, extending prior benchmarks to foster transparency and inclusivity.
Findings
Benchmark covers up to 200 languages, including low-resource ones.
Provides an open-source, auto-updating leaderboard and dashboard.
Offers insights like a global proficiency map and performance trends.
Abstract
To ensure equitable access to the benefits of large language models (LLMs), it is essential to evaluate their capabilities across the world's languages. We introduce the AI Language Proficiency Monitor, a comprehensive multilingual benchmark that systematically assesses LLM performance across up to 200 languages, with a particular focus on low-resource languages. Our benchmark aggregates diverse tasks including translation, question answering, math, and reasoning, using datasets such as FLORES+, MMLU, GSM8K, TruthfulQA, and ARC. We provide an open-source, auto-updating leaderboard and dashboard that supports researchers, developers, and policymakers in identifying strengths and gaps in model performance. In addition to ranking models, the platform offers descriptive insights such as a global proficiency map and trends over time. By complementing and extending prior multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
