Integrating Statistical Uncertainty into Neural Network-Based Speech   Enhancement

Huajian Fang; Tal Peer; Stefan Wermter; Timo Gerkmann

arXiv:2203.02288·eess.AS·May 6, 2022

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann

PDF

TL;DR

This paper introduces a neural network approach for speech enhancement that models uncertainty in spectral estimates, leading to improved performance by combining uncertainty-aware filtering with traditional MAP inference.

Contribution

It proposes a novel method to incorporate uncertainty modeling into neural network-based speech enhancement, enhancing estimation accuracy and robustness.

Findings

01

Uncertainty modeling improves speech enhancement performance.

02

The hybrid loss function effectively combines filter and spectral coefficient estimation.

03

The method outperforms comparable models without uncertainty modeling.

Abstract

Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech. However, most neural network-based methods perform point estimation, i.e., their output consists of a single mask. In this paper, we study the benefits of modeling uncertainty in neural network-based speech enhancement. For this, our neural network is trained to map a noisy spectrogram to the Wiener filter and its associated variance, which quantifies uncertainty, based on the maximum a posteriori (MAP) inference of spectral coefficients. By estimating the distribution instead of the point estimate, one can model the uncertainty associated with each estimate. We further propose to use the estimated Wiener filter and its uncertainty to build an approximate MAP (A-MAP) estimator of spectral magnitudes, which in turn is combined with the MAP inference of spectral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.