The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language

Michael Ong; Sean Robertson; Leo Peckham; Alba Jorquera Jimenez de Aberasturi; Paula Arkhangorodsky; Robin Huo; Aman Sakhardande; Mark Hallap; Naomi Nagy; Ewan Dunbar

arXiv:2409.08103·cs.CL·May 27, 2025

The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language

Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar

PDF

Open Access

TL;DR

This paper introduces the Faetar Benchmark, a challenging low-resource speech recognition dataset for a unique Franco-Provençal dialect, and evaluates baseline results using multilingual foundation models.

Contribution

It presents a new benchmark corpus for under-resourced speech recognition in Faetar and assesses state-of-the-art models on this challenging dataset.

Findings

01

Baseline phone error rate of 30.4% achieved

02

Unlabelled speech data improves model performance

03

Dataset highlights challenges of noisy, low-resource speech recognition

Abstract

We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms of Franco-Proven\c{c}al. The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions, and for which forced alignment is of variable quality. The corpus contains an additional 20 hrs of unlabelled speech. We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%, using a pipeline that continues pre-training on the foundation model using the unlabelled set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis