Automated detection of pronunciation errors in non-native English speech employing deep learning
Daniel Korzekwa

TL;DR
This paper introduces advanced deep learning techniques for detecting pronunciation errors in non-native English speech, significantly improving accuracy over existing methods by generating synthetic mispronounced data and employing end-to-end detection models.
Contribution
It presents novel deep learning approaches, including synthetic data generation and multi-task end-to-end models, to enhance pronunciation error detection in non-native speech.
Findings
Outperformed state-of-the-art in AUC metric by 41%
Effective detection and reconstruction of dysarthric speech
Applied in Amazon for automatic pronunciation error detection
Abstract
Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from 0.528 to 0.749. One of the problems with existing CAPT methods is the low availability of annotated mispronounced speech needed for reliable training of pronunciation error detection models. Therefore, the detection of pronunciation errors is reformulated to the task of generating synthetic mispronounced speech. Intuitively, if we could mimic mispronounced speech and produce any amount of training data, detecting pronunciation errors would be more effective.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research
MethodsALIGN
