A Full Text-Dependent End to End Mispronunciation Detection and   Diagnosis with Easy Data Augmentation Techniques

Kaiqi Fu; Jones Lin; Dengfeng Ke; Yanlu Xie; Jinsong Zhang; and Binghuai Lin

arXiv:2104.08428·cs.CL·April 20, 2021·26 cites

A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

Kaiqi Fu, Jones Lin, Dengfeng Ke, Yanlu Xie, Jinsong Zhang, and Binghuai Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel text-dependent end-to-end mispronunciation detection system that leverages prior text and data augmentation to improve phoneme mispronunciation detection accuracy.

Contribution

The paper proposes a fully end-to-end, text-dependent MD&D model using attention mechanisms and introduces three data augmentation techniques to address class imbalance.

Findings

01

Achieved an F-measure of 56.08%, outperforming previous CNN-RNN-CTC models.

02

Effectively mitigated class imbalance with simple data augmentation methods.

03

Demonstrated improved mispronunciation detection on L2-ARCTIC dataset.

Abstract

Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cageyoko/CTC-Attention-Mispronunciation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis