Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models
Pranay Dighe, Afsaneh Asaei, Herve Bourlard

TL;DR
This paper introduces a novel approach using low-rank and sparse reconstructions of senone probabilities as soft targets, improving DNN acoustic models for speech recognition and reducing word error rate.
Contribution
It proposes a new method leveraging PCA and sparse coding to generate better soft targets from DNN outputs, addressing noise and inaccuracies in traditional GMM-HMM alignments.
Findings
Achieved 4.6% relative reduction in word error rate on AMI corpus.
Demonstrated effectiveness of low-rank and sparse reconstructions in improving acoustic modeling.
Enabled training with untranscribed data using enhanced soft targets.
Abstract
Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
