Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition
Vikramjit Mitra, Wen Wang, Chris Bartels, Horacio Franco, Dimitra, Vergyri

TL;DR
This paper demonstrates that combining multi-view features and articulatory information in deep neural network models significantly improves large vocabulary continuous speech recognition accuracy, especially for spontaneous and non-native speech.
Contribution
It introduces a novel multi-view feature approach combined with articulatory data, leading to notable reductions in word error rates in speech recognition tasks.
Findings
Multi-view features reduce WER compared to single features.
Articulatory information further decreases WER, especially for non-native speech.
Achieved 12% relative WER reduction on NIST 2000 test sets.
Abstract
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, which are combined and fed as a multi-view feature to a single CNN acoustic model. Use of multi-view feature representation demonstrated significant reduction in word error rates (WERs) compared to the use of individual features by themselves. In addition, when articulatory information was used as an additional input to a fused deep neural network (DNN) and CNN acoustic model, it was found to demonstrate further reduction in WER for the Switchboard subset and the CallHome subset (containing partly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
