Masked Acoustic Unit for Mispronunciation Detection and Correction

Zhan Zhang; Yuehai Wang; and Jianyi Yang

arXiv:2108.05517·eess.AS·May 5, 2022

Masked Acoustic Unit for Mispronunciation Detection and Correction

Zhan Zhang, Yuehai Wang, and Jianyi Yang

PDF

Open Access

TL;DR

This paper introduces a novel approach for pronunciation training that uses masked acoustic units to detect and correct mispronunciations, providing speech-based feedback for language learners.

Contribution

It proposes a masked acoustic unit method that improves mispronunciation detection and correction without requiring detailed phoneme annotations.

Findings

01

Effective mispronunciation detection and correction using acoustic units

02

Provides speech-based feedback for language learners

03

Reduces reliance on expensive annotated datasets

Abstract

Computer-Assisted Pronunciation Training (CAPT) plays an important role in language learning. Conventional ASR-based CAPT methods require expensive annotation of the ground truth pronunciation for the supervised training. Meanwhile, certain undefined non-native phonemes cannot be correctly classified into standard phonemes, making the annotation process challenging and subjective. On the other hand, ASR-based CAPT methods only give the learner text-based feedback about the mispronunciation, but cannot teach the learner how to pronounce the sentence correctly. To solve these limitations, we propose to use the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction. The proposed method uses the masked AU sequence and the target phonemes to detect the error AU and then corrects it. This method can give the learner speech-based self-imitating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques