A Two-stage Complex Network using Cycle-consistent Generative   Adversarial Networks for Speech Enhancement

Guochen Yu; Yutian Wang; Hui Wang; Qin Zhang; Chengshi Zheng

arXiv:2109.02011·cs.SD·September 7, 2021·5 cites

A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Guochen Yu, Yutian Wang, Hui Wang, Qin Zhang, Chengshi Zheng

PDF

Open Access

TL;DR

This paper introduces a novel two-stage speech enhancement system combining CycleGAN-based magnitude estimation with a complex spectral refining network, significantly improving noise suppression and speech quality over existing methods.

Contribution

It proposes a multi-stage denoising approach that effectively reduces residual noise and estimates phase information, advancing beyond traditional single-stage CycleGAN-based systems.

Findings

01

Outperforms previous CycleGAN-based systems in noise suppression

02

Achieves higher scores on multiple speech enhancement metrics

03

Demonstrates robustness across different datasets

Abstract

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing