Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models
Mahdi Khademian, Mohammad Mehdi Homayounpour

TL;DR
This paper introduces a novel approach for modeling state-conditional observation distributions in factorial speech processing models using weighted stereo samples, significantly improving noise-robust speech recognition especially in low SNR conditions.
Contribution
It extends previous single pass retraining methods to support multiple audio sources and allows independent feature space selection for source and noisy features.
Findings
Up to 4% absolute improvement in word recognition accuracy in noisy conditions.
Effective modeling of non-stationary noise as multiple source states.
Enhanced flexibility in feature space selection for noisy speech recognition.
Abstract
This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4% absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
