Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

Nhan Phan; Mikko Kuronen; Maria Kautonen; Riikka Ullakonoja; Anna von Zansen; Yaroslav Getman; Ekaterina Voskoboinik; Tam\'as Gr\'osz; Mikko Kurimo

arXiv:2506.01156·cs.CL·August 21, 2025

Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

Nhan Phan, Mikko Kuronen, Maria Kautonen, Riikka Ullakonoja, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Tam\'as Gr\'osz, Mikko Kurimo

PDF

1 Repo

TL;DR

This study presents a simple, language-independent mispronunciation detection model for Finland Swedish that performs well with minimal L2 data, addressing the lack of tools for low-resource languages.

Contribution

We introduce a novel, minimal-data, language-independent mispronunciation detection method tailored for low-resource languages like Finland Swedish.

Findings

01

Achieved 43.2% recall and 29.8% precision in L2 mispronunciation detection.

02

Compared to baseline, improved balance between recall and precision.

03

Demonstrated effectiveness with limited L2 data in a low-resource setting.

Abstract

Mispronunciation detection (MD) models are the cornerstones of many language learning applications. Unfortunately, most systems are built for English and other major languages, while low-resourced language varieties, such as Finland Swedish (FS), lack such tools. In this paper, we introduce our MD model for FS, trained on 89 hours of first language (L1) speakers' spontaneous speech and tested on 33 minutes of L2 transcribed read-aloud speech. We trained a multilingual wav2vec 2.0 model with entropy regularization, followed by temperature scaling and top-k normalization after the inference to better adapt it for MD. The main novelty of our method lies in its simplicity, requiring minimal L2 data. The process is also language-independent, making it suitable for other low-resource languages. Our proposed algorithm allows us to balance Recall (43.2%) and Precision (29.8%), compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aalto-speech/FinSwedish
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.