Direction-Preserving MIMO Speech Enhancement Using a Neural Covariance Estimator
Thomas Deppisch

TL;DR
This paper introduces a neural covariance estimator for direction-preserving MIMO speech enhancement, improving multichannel speech quality and spatial property retention with low computational cost.
Contribution
It presents a fully blind, neural-based covariance estimation method that outperforms mask-based baselines in multichannel speech enhancement tasks.
Findings
Enhanced speech quality and spatial preservation demonstrated in experiments.
The proposed method approaches oracle performance with fewer parameters.
Improved downstream task performance over baseline methods.
Abstract
Multichannel speech enhancement is widely used as a front-end in microphone array processing systems. While most existing approaches produce a single enhanced signal, direction-preserving multiple-input multiple-output (MIMO) methods instead aim to provide enhanced multichannel signals that retain directional properties, enabling downstream applications such as beamforming, binaural rendering, and direction-of-arrival estimation. In this work, we propose a fully blind, direction-preserving MIMO speech enhancement method based on neural estimation of the spatial noise covariance matrix. A lightweight OnlineSpatialNet estimates a scale-normalized Cholesky factor of the frequency-domain noise covariance, which is combined with a direction-preserving MIMO Wiener filter to enhance speech while preserving the spatial characteristics of both target and residual noise. In contrast to prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
