Investigation of Deep Neural Network Acoustic Modelling Approaches for   Low Resource Accented Mandarin Speech Recognition

Xurong Xie; Xiang Sui; Xunying Liu; Lan Wang

arXiv:2201.09432·eess.AS·June 17, 2024·1 cites

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang

PDF

Open Access

TL;DR

This paper explores various deep neural network approaches for recognizing accented Mandarin speech in low-resource settings, emphasizing the modeling of accent variability to improve recognition accuracy.

Contribution

It introduces an improved multi-level adaptive network tandem HMM system that explicitly uses accent information, outperforming baseline models on low-resource accented Mandarin speech.

Findings

01

MLAN tandem HMM system outperforms baseline by 0.8%-1.5% CER

02

Explicit accent information improves recognition accuracy

03

Multi-accent modeling techniques are effective in low-resource scenarios

Abstract

The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing