MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

TL;DR
MP-SENet is a novel speech enhancement model that simultaneously denoises magnitude and phase spectra using a parallel architecture with transformer-based encoding and decoding, leading to improved speech quality.
Contribution
It introduces a parallel denoising approach for magnitude and phase spectra with a transformer-based codec architecture, enhancing speech enhancement performance.
Findings
Achieves PESQ of 3.50 on VoiceBank+DEMAND dataset
Outperforms existing advanced speech enhancement methods
Uses multi-level loss functions for joint training
Abstract
This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
MethodsSigmoid Activation
