Explicit Estimation of Magnitude and Phase Spectra in Parallel for   High-Quality Speech Enhancement

Ye-Xin Lu; Yang Ai; and Zhen-Hua Ling

arXiv:2308.08926·eess.AS·April 2, 2024·5 cites

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Ye-Xin Lu, Yang Ai, and Zhen-Hua Ling

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces MP-SENet, a novel speech enhancement network that explicitly estimates magnitude and phase spectra in parallel, utilizing a Transformer-based architecture to improve speech quality across various tasks.

Contribution

The paper presents a Transformer-embedded encoder-decoder architecture for explicit parallel estimation of magnitude and phase spectra, advancing speech enhancement methods.

Findings

01

Achieves state-of-the-art performance in speech denoising, dereverberation, and bandwidth extension.

02

Explicit phase estimation improves perceptual speech quality.

03

Employs multi-level loss functions and a metric discriminator for better training and perceptual alignment.

Abstract

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yxlu-0102/MP-SENet
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis