Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian   Mixture Models

Huajian Fang; Timo Gerkmann

arXiv:2212.04831·eess.AS·May 16, 2023·1 cites

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Huajian Fang, Timo Gerkmann

PDF

Open Access

TL;DR

This paper introduces a method to quantify predictive uncertainty in deep speech enhancement by integrating complex Gaussian mixture models, leading to more robust and accurate clean speech estimation.

Contribution

It proposes a novel framework combining statistical CGMMs with neural networks to model the full posterior distribution of clean speech, capturing both aleatoric and epistemic uncertainties.

Findings

01

Effectively captures predictive uncertainty in speech enhancement.

02

Achieves superior performance compared to traditional methods.

03

Demonstrates robustness across multiple datasets.

Abstract

Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former accounts for the inherent uncertainty in data and the latter corresponds to the model uncertainty. Aiming for robust clean speech estimation and efficient predictive uncertainty quantification, we propose to integrate statistical complex Gaussian mixture models (CGMMs) into a deep speech enhancement framework. More specifically, we model the dependency between input and output stochastically by means of a conditional probability density and train a neural network to map the noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Hearing Loss and Rehabilitation