Computer-assisted Pronunciation Training -- Speech synthesis is almost   all you need

Daniel Korzekwa; Jaime Lorenzo-Trueba; Thomas Drugman; Bozena Kostek

arXiv:2207.00774·eess.AS·July 5, 2022

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek

PDF

TL;DR

This paper introduces innovative speech synthesis techniques to generate synthetic non-native speech, significantly improving pronunciation error detection accuracy in computer-assisted pronunciation training.

Contribution

It presents three novel speech generation methods—P2P, T2S, and S2S—that enhance error detection models and establish new state-of-the-art results in CAPT.

Findings

01

S2S technique improves error detection AUC by 41%

02

Synthetic speech generation enhances pronunciation error detection accuracy

03

Achieved new state-of-the-art in CAPT error detection metrics

Abstract

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high accuracy (only 60\% precision at 40\%-80\% recall). One of the key problems is the low availability of mispronounced speech that is needed for the reliable training of pronunciation error detection models. If we had a generative model that could mimic non-native speech and produce any amount of training data, then the task of detecting pronunciation errors would be much easier. We present three innovative techniques based on phoneme-to-phoneme (P2P),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.