An End-to-End Mispronunciation Detection System for L2 English Speech   Leveraging Novel Anti-Phone Modeling

Bi-Cheng Yan; Meng-Che Wu; Hsiao-Tsung Hung; Berlin Chen

arXiv:2005.11950·eess.AS·August 31, 2020·1 cites

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces an end-to-end mispronunciation detection system for L2 English that leverages anti-phone modeling to better detect both categorical and non-categorical errors, outperforming existing methods.

Contribution

It proposes a novel anti-phone expansion and transfer-learning paradigm for end-to-end ASR-based mispronunciation detection, addressing non-categorical errors without relying on phonological rules.

Findings

01

Outperforms existing E2E baseline in F1-score by 11.05%.

02

Achieves 27.71% improvement over GOP-based pronunciation scoring.

03

Effectively detects both categorical and non-categorical mispronunciations.

Abstract

Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end- to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research