Refining DNN-based Mask Estimation using CGMM-based EM Algorithm for Multi-channel Noise Reduction
Julitta Bartolewska, Stanis{\l}aw Kacprzak, Konrad Kowalczyk

TL;DR
This paper introduces a multi-channel mask refinement technique using a CGMM-based EM algorithm to enhance speech quality in multi-channel noise reduction, building on existing DNN-based methods.
Contribution
It proposes a novel iterative refinement method combining CGMM-based EM algorithm with spatial filtering to improve DNN-derived masks for speech enhancement.
Findings
Improved mask accuracy as shown by higher AUC scores
Enhanced speech quality measured by PESQ improvements
Consistent performance across multiple DNN models
Abstract
In this paper, we present a method that allows to further improve speech enhancement obtained with recently introduced Deep Neural Network (DNN) models. We propose a multi-channel refinement method of time-frequency masks obtained with single-channel DNNs, which consists of an iterative Complex Gaussian Mixture Model (CGMM) based algorithm, followed by optimum spatial filtration. We validate our approach on time-frequency masks estimated with three recent deep learning models, namely DCUnet, DCCRN, and FullSubNet. We show that our method with the proposed mask refinement procedure allows to improve the accuracy of estimated masks, in terms of the Area Under the ROC Curve (AUC) measure, and as a consequence the overall speech quality of the enhanced speech signal, as measured by PESQ improvement, and that the improvement is consistent across all three DNN models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
