CAM\~OES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese

Carlos Carvalho; Francisco Teixeira; Catarina Botelho; Anna Pompili; Rub\'en Solera-Ure\~na; S\'ergio Paulo; Mariana Juli\~ao; Thomas Rolland; John Mendon\c{c}a; Diogo Pereira; Isabel Trancoso; Alberto Abad

arXiv:2508.19721·cs.CL·August 28, 2025

CAM\~OES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese

Carlos Carvalho, Francisco Teixeira, Catarina Botelho, Anna Pompili, Rub\'en Solera-Ure\~na, S\'ergio Paulo, Mariana Juli\~ao, Thomas Rolland, John Mendon\c{c}a, Diogo Pereira, Isabel Trancoso, Alberto Abad

PDF

3 Models 1 Datasets

TL;DR

This paper introduces CAM extasciitilde OES, a comprehensive benchmark and collection of models for European Portuguese speech recognition, addressing the lack of resources for this language variety and establishing new state-of-the-art results.

Contribution

It provides the first open framework for European Portuguese ASR, including a benchmark and diverse models, advancing research in under-explored language varieties.

Findings

01

Fine-tuned foundation models perform comparably to E-Branchformer.

02

Best models improve WER by over 35% relative to zero-shot models.

03

Established new state-of-the-art for European Portuguese ASR.

Abstract

Existing resources for Automatic Speech Recognition in Portuguese are mostly focused on Brazilian Portuguese, leaving European Portuguese (EP) and other varieties under-explored. To bridge this gap, we introduce CAM\~OES, the first open framework for EP and other Portuguese varieties. It consists of (1) a comprehensive evaluation benchmark, including 46h of EP test data spanning multiple domains; and (2) a collection of state-of-the-art models. For the latter, we consider multiple foundation models, evaluating their zero-shot and fine-tuned performances, as well as E-Branchformer models trained from scratch. A curated set of 425h of EP was used for both fine-tuning and training. Our results show comparable performance for EP between fine-tuned foundation models and the E-Branchformer. Furthermore, the best-performing models achieve relative improvements above 35% WER, compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

inesc-id/camoes_asr
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.