L1-aware Multilingual Mispronunciation Detection Framework

Yassine El Kheir; Shammur Absar Chowdhury; Ahmed Ali

arXiv:2309.07719·cs.CL·September 22, 2023

L1-aware Multilingual Mispronunciation Detection Framework

Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

PDF

Open Access

TL;DR

This paper presents a multilingual mispronunciation detection framework that incorporates L1-aware speech representations, improving accuracy and robustness across multiple languages and datasets.

Contribution

The paper introduces L1-MultiMDD, a novel end-to-end multilingual mispronunciation detection architecture with L1-aware embeddings and multi-task training, enhancing detection performance.

Findings

01

Significant reduction in phoneme error rate (PER) across languages.

02

Improved false rejection rate (FRR) demonstrating robustness.

03

Effective generalization to unseen datasets.

Abstract

The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechanism is deployed to align the input audio with the reference phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an auxiliary model, pretrained in a multi-task setup identifying L1 and L2 language, and are infused with the primary network. Finally, the L1-MultiMDD is then optimized for a unified multilingual phoneme recognition task using connectionist temporal classification (CTC) loss for the target languages: English, Arabic, and Mandarin. Our experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsALIGN