Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling
Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira, Calamaro, Thomas Drugman, Bozena Kostek

TL;DR
This paper introduces an uncertainty-aware model for detecting mispronunciations in non-native English speech, addressing recognition inaccuracies and multiple valid pronunciations to improve detection precision.
Contribution
It presents a novel approach that incorporates uncertainty modeling and multiple pronunciation variants, advancing beyond traditional single-pronunciation recognition methods.
Findings
Up to 18% relative increase in detection precision
Effective handling of pronunciation variability
Improved false alarm reduction
Abstract
A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result in a significant amount of false mispronunciation alarms. We propose a novel approach to overcome this problem based on two principles: a) taking into account uncertainty in the automatic phoneme recognition step, b) accounting for the fact that there may be multiple valid pronunciations. We evaluate the model on non-native (L2) English speech of German, Italian and Polish speakers, where it is shown to increase the precision of detecting mispronunciations by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
