ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs

In\^es Vieira; In\^es Calvo; Iago Paulo; James Furtado; Rafael Ferreira; Diogo Tavares; Diogo Gl\'oria-Silva; David Semedo; Jo\~ao Magalh\~aes

arXiv:2603.26516·cs.CL·March 30, 2026

ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs

In\^es Vieira, In\^es Calvo, Iago Paulo, James Furtado, Rafael Ferreira, Diogo Tavares, Diogo Gl\'oria-Silva, David Semedo, Jo\~ao Magalh\~aes

PDF

TL;DR

ALBA is a new European Portuguese benchmark assessing LLMs across eight linguistic dimensions, highlighting variability in performance and the need for comprehensive, language-specific evaluation tools.

Contribution

The paper introduces ALBA, a linguistically grounded, expert-constructed benchmark for evaluating LLMs in European Portuguese across multiple linguistic aspects.

Findings

01

Performance varies across linguistic dimensions in LLMs.

02

ALBA reveals specific strengths and weaknesses in LLMs for pt-PT.

03

Benchmark supports scalable, expert-driven evaluation of language models.

Abstract

As Large Language Models (LLMs) expand across multilingual domains, evaluating their performance in under-represented languages becomes increasingly important. European Portuguese (pt-PT) is particularly affected, as existing training data and benchmarks are mainly in Brazilian Portuguese (pt-BR). To address this, we introduce ALBA, a linguistically grounded benchmark designed from the ground up to assess LLM proficiency in linguistic-related tasks in pt-PT across eight linguistic dimensions, including Language Variety, Culture-bound Semantics, Discourse Analysis, Word Plays, Syntax, Morphology, Lexicology, and Phonetics and Phonology. ALBA is manually constructed by language experts and paired with an LLM-as-a-judge framework for scalable evaluation of pt-PT generated language. Experiments on a diverse set of models reveal performance variability across linguistic dimensions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.