Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN   for non-parallel speech enhancement

Guochen Yu; Andong Li; Yutian Wang; Yinuo Guo; Hui Wang; Chengshi; Zheng

arXiv:2109.12591·cs.SD·February 15, 2022

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

Guochen Yu, Andong Li, Yutian Wang, Yinuo Guo, Hui Wang, Chengshi, Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces CinCGAN, a novel cycle-in-cycle GAN framework that jointly estimates speech magnitude and phase from unpaired data, significantly improving speech enhancement quality in non-parallel training scenarios.

Contribution

The paper proposes a new Cycle-in-Cycle GAN architecture for joint magnitude and phase estimation using unpaired data, advancing non-parallel speech enhancement techniques.

Findings

01

Outperforms previous methods under non-parallel training conditions

02

Achieves significant noise reduction and speech quality improvement

03

Demonstrates strong performance even with standard paired data

Abstract

For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNN-based speech enhancement methods. However, because of the severe mismatch between input and target speeches, many previous studies only focus on the magnitude spectrum estimation and remain the phase unaltered, resulting in the degraded speech quality under low signal-to-noise ratio conditions. To tackle this problem, we decouple the difficult target w.r.t. original spectrum optimization into spectral magnitude and phase, and a novel Cycle-in-Cycle generative adversarial network (dubbed CinCGAN) is proposed to jointly estimate the spectral magnitude and phase information stage by stage under unpaired data. In the first stage, we pretrain a magnitude CycleGAN to coarsely estimate the spectral magnitude of clean speech. In the second stage, we incorporate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuguochencuc/cincgan-se
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis