Integration of deep learning with expectation maximization for spatial   cue based speech separation in reverberant conditions

Sania Gul; Muhammad Salman Khan; Syed Waqar Shah

arXiv:2102.13334·eess.AS·March 1, 2021

Integration of deep learning with expectation maximization for spatial cue based speech separation in reverberant conditions

Sania Gul, Muhammad Salman Khan, Syed Waqar Shah

PDF

TL;DR

This paper presents a novel framework combining deep learning and probabilistic EM algorithms for speech separation in reverberant environments, achieving significant improvements over existing methods.

Contribution

It introduces an integrated BSS model that combines U-Net deep learning with EM clustering, leveraging their complementary strengths for better speech separation.

Findings

01

Average 4.3 dB SDR improvement over EM-based MESSL-GS

02

4.3% increase in speech intelligibility (STOI) over MESSL-GS

03

4.5 dB SDR gain over U-Net based SONET

Abstract

In this paper, we formulate a blind source separation (BSS) framework, which allows integrating U-Net based deep learning source separation network with probabilistic spatial machine learning expectation maximization (EM) algorithm for separating speech in reverberant conditions. Our proposed model uses a pre-trained deep learning convolutional neural network, U-Net, for clustering the interaural level difference (ILD) cues and machine learning expectation maximization (EM) algorithm for clustering the interaural phase difference (IPD) cues. The integrated model exploits the complementary strengths of the two approaches to BSS: the strong modeling power of supervised neural networks and the ease of unsupervised machine learning algorithms, whose few parameters can be estimated on as little as a single segment of an audio mixture. The results show an average improvement of 4.3 dB in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · U-Net