MP-SENet: A Speech Enhancement Model with Parallel Denoising of   Magnitude and Phase Spectra

Ye-Xin Lu; Yang Ai; Zhen-Hua Ling

arXiv:2305.13686·eess.AS·January 15, 2024·Interspeech·2 cites

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

PDF

Open Access 1 Repo 1 Models

TL;DR

MP-SENet is a novel speech enhancement model that simultaneously denoises magnitude and phase spectra using a parallel architecture with transformer-based encoding and decoding, leading to improved speech quality.

Contribution

It introduces a parallel denoising approach for magnitude and phase spectra with a transformer-based codec architecture, enhancing speech enhancement performance.

Findings

01

Achieves PESQ of 3.50 on VoiceBank+DEMAND dataset

02

Outperforms existing advanced speech enhancement methods

03

Uses multi-level loss functions for joint training

Abstract

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yxlu-0102/MP-SENet
pytorchOfficial

Models

🤗
rossijakob/toothless-esnet
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsSigmoid Activation