CMGAN: Conformer-based Metric GAN for Speech Enhancement

Ruizhe Cao; Sherif Abdulatif; Bin Yang

arXiv:2203.15149·cs.SD·May 7, 2024

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Ruizhe Cao, Sherif Abdulatif, Bin Yang

PDF

1 Repo

TL;DR

This paper introduces CMGAN, a conformer-based GAN that effectively enhances speech quality by modeling local and global dependencies in the time-frequency domain, outperforming previous models.

Contribution

The paper presents a novel conformer-based generator and a metric discriminator for speech enhancement, improving speech quality metrics over prior methods.

Findings

01

Achieved PESQ of 3.41 and SSNR of 11.10 dB on Voice Bank+DEMAND dataset.

02

Outperforms previous speech enhancement models in quantitative evaluations.

03

Utilizes two-stage conformer blocks for comprehensive spectrogram modeling.

Abstract

Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruizhecao96/cmgan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.