A multi-view approach for Mandarin non-native mispronunciation   verification

Zhenyu Wang; John H.L. Hansen; Yanlu Xie

arXiv:2009.02573·eess.AS·September 10, 2020

A multi-view approach for Mandarin non-native mispronunciation verification

Zhenyu Wang, John H.L. Hansen, Yanlu Xie

PDF

Open Access

TL;DR

This paper introduces a multi-view approach using bidirectional LSTM embeddings to improve Mandarin non-native mispronunciation verification, reducing annotation needs and outperforming traditional methods.

Contribution

The study presents a novel multi-view learning framework that jointly embeds acoustic and multi-source information for more accurate mispronunciation verification.

Findings

01

Achieved +11.23% improvement over GOP-based approach

02

Outperformed single-view approach by +1.47% in accuracy

03

Demonstrated effective use of contrastive loss in embedding models

Abstract

Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. Here, models are jointly learned to embed acoustic sequence and multi-source information for speech attributes and bottleneck features. Bidirectional LSTM embedding models with contrastive losses are used to map acoustic sequences and multi-source information into fixed-dimensional embeddings. The distance between acoustic embeddings is taken as the similarity between phones. Accordingly, examples of mispronounced phones are expected to have a small similarity score with their canonical pronunciations. The approach shows improvement over GOP-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing