Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning   and Language Identification for Improved Low-resource Performance

Reihaneh Amooie; Wietse de Vries; Yun Hao; Jelske Dijkstra; Matt; Coler; Martijn Wieling

arXiv:2502.04883·cs.CL·February 10, 2025

Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance

Reihaneh Amooie, Wietse de Vries, Yun Hao, Jelske Dijkstra, Matt, Coler, Martijn Wieling

PDF

Open Access

TL;DR

This paper explores enhancing low-resource Frisian ASR by multilingual fine-tuning and language ID, revealing significant dialectal performance challenges and the importance of diverse data for realistic evaluation.

Contribution

It introduces a multilingual fine-tuning approach with language identification to improve Frisian ASR and highlights dialectal data collection's impact on performance.

Findings

01

Multilingual fine-tuning improves Frisian ASR performance.

02

Dialectal speech recognition performance is significantly affected.

03

Evaluation on standard data may underestimate real-world dialectal performance.

Abstract

Automatic Speech Recognition (ASR) performance for low-resource languages is still far behind that of higher-resource languages such as English, due to a lack of sufficient labeled data. State-of-the-art methods deploy self-supervised transfer learning where a model pre-trained on large amounts of data is fine-tuned using little labeled data in a target low-resource language. In this paper, we present and examine a method for fine-tuning an SSL-based model in order to improve the performance for Frisian and its regional dialects (Clay Frisian, Wood Frisian, and South Frisian). We show that Frisian ASR performance can be improved by using multilingual (Frisian, Dutch, English and German) fine-tuning data and an auxiliary language identification task. In addition, our findings show that performance on dialectal speech suffers substantially, and, importantly, that this effect is moderated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Phonetics and Phonology Research