Nonlinear Spatial Filtering in Multichannel Speech Enhancement
Kristina Tesch, Timo Gerkmann

TL;DR
This paper explores the use of nonlinear spatial filters learned by neural networks for multichannel speech enhancement, demonstrating significant improvements over traditional linear beamformers especially in non-Gaussian noise scenarios.
Contribution
It introduces a joint nonlinear spatial and spectral filtering approach using neural networks, outperforming traditional linear beamforming in various non-Gaussian noise conditions.
Findings
Nonlinear filters outperform linear beamformers in super-Gaussian noise.
Nonlinear filters effectively suppress multiple interfering sources.
Significant improvements observed on real-world CHiME3 data.
Abstract
The majority of multichannel speech enhancement algorithms are two-step procedures that first apply a linear spatial filter, a so-called beamformer, and combine it with a single-channel approach for postprocessing. However, the serial concatenation of a linear spatial filter and a postfilter is not generally optimal in the minimum mean square error (MMSE) sense for noise distributions other than a Gaussian distribution. Rather, the MMSE optimal filter is a joint spatial and spectral nonlinear function. While estimating the parameters of such a filter with traditional methods is challenging, modern neural networks may provide an efficient way to learn the nonlinear function directly from data. To see if further research in this direction is worthwhile, in this work we examine the potential performance benefit of replacing the common two-step procedure with a joint spatial and spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
