TL;DR
This paper introduces a deep learning approach that embeds the multi-frame MVDR filter within a neural network to improve single-microphone speech enhancement, leveraging temporal convolutional networks to estimate filter parameters.
Contribution
The novel integration of MFMVDR filtering with TCNs for parameter estimation enhances speech enhancement performance over existing methods.
Findings
Achieves competitive speech enhancement results on the Deep Noise Suppression Challenge dataset.
Estimates MFMVDR parameters more effectively than direct mask estimation methods.
Outperforms Conv-TasNet in PESQ and STOI metrics.
Abstract
Multi-frame algorithms for single-microphone speech enhancement, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain. Provided that accurate estimates of the required speech interframe correlation vector and the noise correlation matrix are available, it has been shown that the MFMVDR filter yields a substantial noise reduction while hardly introducing any speech distortion. Aiming at merging the speech enhancement potential of the MFMVDR filter and the estimation capability of temporal convolutional networks (TCNs), in this paper we propose to embed the MFMVDR filter within a deep learning framework. The TCNs are trained to map the noisy speech STFT coefficients to the required quantities by minimizing the scale-invariant signal-to-distortion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
