Magnitude-and-phase-aware Speech Enhancement with Parallel Sequence   Modeling

Yuewei Zhang; Huanbin Zou; Jie Zhu

arXiv:2310.07316·eess.AS·October 12, 2023·ASRU

Magnitude-and-phase-aware Speech Enhancement with Parallel Sequence Modeling

Yuewei Zhang, Huanbin Zou, Jie Zhu

PDF

Open Access

TL;DR

This paper introduces MPCRN, a speech enhancement model that estimates magnitude masks and normalized cIRM using a real network, combined with parallel sequence modeling, achieving superior performance without complex neural networks.

Contribution

It proposes a novel magnitude-and-phase-aware CRN model with parallel sequence modeling, avoiding complex networks and improving speech enhancement quality.

Findings

01

MPCRN outperforms previous methods in speech enhancement.

02

Using a real network for magnitude and phase estimation reduces model complexity.

03

Parallel sequence modeling enhances the RNN-based SE model.

Abstract

In speech enhancement (SE), phase estimation is important for perceptual quality, so many methods take clean speech's complex short-time Fourier transform (STFT) spectrum or the complex ideal ratio mask (cIRM) as the learning target. To predict these complex targets, the common solution is to design a complex neural network, or use a real network to separately predict the real and imaginary parts of the target. But in this paper, we propose to use a real network to estimate the magnitude mask and normalized cIRM, which not only avoids the significant increase of the model complexity caused by complex networks, but also shows better performance than previous phase estimation methods. Meanwhile, we devise a parallel sequence modeling (PSM) block to improve the RNN block in the convolutional recurrent network (CRN)-based SE model. We name our method as magnitude-and-phase-aware and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies

MethodsConditional Relation Network