Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement
Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann

TL;DR
This paper introduces a neural network approach for speech enhancement that models uncertainty in spectral estimates, leading to improved performance by combining uncertainty-aware filtering with traditional MAP inference.
Contribution
It proposes a novel method to incorporate uncertainty modeling into neural network-based speech enhancement, enhancing estimation accuracy and robustness.
Findings
Uncertainty modeling improves speech enhancement performance.
The hybrid loss function effectively combines filter and spectral coefficient estimation.
The method outperforms comparable models without uncertainty modeling.
Abstract
Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech. However, most neural network-based methods perform point estimation, i.e., their output consists of a single mask. In this paper, we study the benefits of modeling uncertainty in neural network-based speech enhancement. For this, our neural network is trained to map a noisy spectrogram to the Wiener filter and its associated variance, which quantifies uncertainty, based on the maximum a posteriori (MAP) inference of spectral coefficients. By estimating the distribution instead of the point estimate, one can model the uncertainty associated with each estimate. We further propose to use the estimated Wiener filter and its uncertainty to build an approximate MAP (A-MAP) estimator of spectral magnitudes, which in turn is combined with the MAP inference of spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
