Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network   for End-to-end Speech Enhancement

Shuaishuai Ye; Xinhui Hu; Xinkang Xu

arXiv:2008.07787·eess.AS·October 1, 2020·6 cites

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

Shuaishuai Ye, Xinhui Hu, Xinkang Xu

PDF

Open Access

TL;DR

This paper introduces TDCGAN, a novel end-to-end speech enhancement model using temporal dilated convolutions and a regularization technique, significantly improving speech quality while reducing model complexity.

Contribution

The paper presents the first integration of temporal dilated convolutions with depthwise separable convolutions in GANs for speech enhancement and explores SNR-based regularization.

Findings

01

Outperforms state-of-the-art GAN-based speech enhancement methods

02

Reduces the number of model parameters significantly

03

SNR penalty regularization improves speech SNR more effectively than L1

Abstract

In this paper, in order to further deal with the performance degradation caused by ignoring the phase information in conventional speech enhancement systems, we proposed a temporal dilated convolutional generative adversarial network (TDCGAN) in the end-to-end based speech enhancement architecture. For the first time, we introduced the temporal dilated convolutional network with depthwise separable convolutions into the GAN structure so that the receptive field can be greatly increased without increasing the number of parameters. We also first explored the effect of signal-to-noise ratio (SNR) penalty item as regularization of the loss function of generator on improving the SNR of enhanced speech. The experimental results demonstrated that our proposed method outperformed the state-of-the-art end-to-end GAN-based speech enhancement. Moreover, compared with previous GAN-based methods,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing