Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition
Pranay Dighe, Gil Luyet, Afsaneh Asaei, Herve Bourlard

TL;DR
This paper introduces a novel approach that models DNN-based acoustic posteriors as unions of low-dimensional subspaces, enabling improved speech recognition accuracy especially under noisy conditions.
Contribution
It presents a low-dimensional subspace modeling technique for DNN posteriors, enhancing acoustic modeling by reducing errors due to mismatch conditions.
Findings
Achieved up to 15.4% relative WER reduction in speech recognition.
Demonstrated the effectiveness of low-dimensional structures in improving DNN acoustic models.
Validated improvements in both clean and noisy speech recognition scenarios.
Abstract
We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
