Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang

TL;DR
This paper explores various deep neural network approaches for recognizing accented Mandarin speech in low-resource settings, emphasizing the modeling of accent variability to improve recognition accuracy.
Contribution
It introduces an improved multi-level adaptive network tandem HMM system that explicitly uses accent information, outperforming baseline models on low-resource accented Mandarin speech.
Findings
MLAN tandem HMM system outperforms baseline by 0.8%-1.5% CER
Explicit accent information improves recognition accuracy
Multi-accent modeling techniques are effective in low-resource scenarios
Abstract
The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
