Feature Joint-State Posterior Estimation in Factorial Speech Processing   Models using Deep Neural Networks

Mahdi Khademian; Mohammad Mehdi Homayounpour

arXiv:1707.02661·cs.SD·July 11, 2017

Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks

Mahdi Khademian, Mohammad Mehdi Homayounpour

PDF

Open Access

TL;DR

This paper introduces a neural network architecture for estimating joint-state posteriors in factorial speech models, improving performance over traditional methods in speech separation and recognition tasks.

Contribution

It presents a novel neural network architecture and objective function for extracting joint-state posteriors from stereo features in factorial speech processing models.

Findings

01

Achieved 2.3% absolute improvement in speech separation and recognition.

02

Demonstrated effectiveness of neural networks over vector Taylor series method.

03

Simplified joint-state posterior extraction process.

Abstract

This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques