ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs
In\^es Vieira, In\^es Calvo, Iago Paulo, James Furtado, Rafael Ferreira, Diogo Tavares, Diogo Gl\'oria-Silva, David Semedo, Jo\~ao Magalh\~aes

TL;DR
ALBA is a new European Portuguese benchmark assessing LLMs across eight linguistic dimensions, highlighting variability in performance and the need for comprehensive, language-specific evaluation tools.
Contribution
The paper introduces ALBA, a linguistically grounded, expert-constructed benchmark for evaluating LLMs in European Portuguese across multiple linguistic aspects.
Findings
Performance varies across linguistic dimensions in LLMs.
ALBA reveals specific strengths and weaknesses in LLMs for pt-PT.
Benchmark supports scalable, expert-driven evaluation of language models.
Abstract
As Large Language Models (LLMs) expand across multilingual domains, evaluating their performance in under-represented languages becomes increasingly important. European Portuguese (pt-PT) is particularly affected, as existing training data and benchmarks are mainly in Brazilian Portuguese (pt-BR). To address this, we introduce ALBA, a linguistically grounded benchmark designed from the ground up to assess LLM proficiency in linguistic-related tasks in pt-PT across eight linguistic dimensions, including Language Variety, Culture-bound Semantics, Discourse Analysis, Word Plays, Syntax, Morphology, Lexicology, and Phonetics and Phonology. ALBA is manually constructed by language experts and paired with an LLM-as-a-judge framework for scalable evaluation of pt-PT generated language. Experiments on a diverse set of models reveal performance variability across linguistic dimensions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
