Modeling State-Conditional Observation Distribution using Weighted   Stereo Samples for Factorial Speech Processing Models

Mahdi Khademian; Mohammad Mehdi Homayounpour

arXiv:1503.02578·cs.LG·October 6, 2016

Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models

Mahdi Khademian, Mohammad Mehdi Homayounpour

PDF

TL;DR

This paper introduces a novel approach for modeling state-conditional observation distributions in factorial speech processing models using weighted stereo samples, significantly improving noise-robust speech recognition especially in low SNR conditions.

Contribution

It extends previous single pass retraining methods to support multiple audio sources and allows independent feature space selection for source and noisy features.

Findings

01

Up to 4% absolute improvement in word recognition accuracy in noisy conditions.

02

Effective modeling of non-stationary noise as multiple source states.

03

Enhanced flexibility in feature space selection for noisy speech recognition.

Abstract

This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4% absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.