A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement
Guochen Yu, Yutian Wang, Hui Wang, Qin Zhang, Chengshi Zheng

TL;DR
This paper introduces a novel two-stage speech enhancement system combining CycleGAN-based magnitude estimation with a complex spectral refining network, significantly improving noise suppression and speech quality over existing methods.
Contribution
It proposes a multi-stage denoising approach that effectively reduces residual noise and estimates phase information, advancing beyond traditional single-stage CycleGAN-based systems.
Findings
Outperforms previous CycleGAN-based systems in noise suppression
Achieves higher scores on multiple speech enhancement metrics
Demonstrates robustness across different datasets
Abstract
Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing
