Integration of deep learning with expectation maximization for spatial cue based speech separation in reverberant conditions
Sania Gul, Muhammad Salman Khan, Syed Waqar Shah

TL;DR
This paper presents a novel framework combining deep learning and probabilistic EM algorithms for speech separation in reverberant environments, achieving significant improvements over existing methods.
Contribution
It introduces an integrated BSS model that combines U-Net deep learning with EM clustering, leveraging their complementary strengths for better speech separation.
Findings
Average 4.3 dB SDR improvement over EM-based MESSL-GS
4.3% increase in speech intelligibility (STOI) over MESSL-GS
4.5 dB SDR gain over U-Net based SONET
Abstract
In this paper, we formulate a blind source separation (BSS) framework, which allows integrating U-Net based deep learning source separation network with probabilistic spatial machine learning expectation maximization (EM) algorithm for separating speech in reverberant conditions. Our proposed model uses a pre-trained deep learning convolutional neural network, U-Net, for clustering the interaural level difference (ILD) cues and machine learning expectation maximization (EM) algorithm for clustering the interaural phase difference (IPD) cues. The integrated model exploits the complementary strengths of the two approaches to BSS: the strong modeling power of supervised neural networks and the ease of unsupervised machine learning algorithms, whose few parameters can be estimated on as little as a single segment of an audio mixture. The results show an average improvement of 4.3 dB in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · U-Net
