UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren

TL;DR
This paper introduces UNetGAN, a novel time-domain speech enhancement method using U-Net and GANs, specifically designed to improve speech quality in extremely low SNR conditions, outperforming existing models.
Contribution
It presents a robust GAN-based speech enhancement model operating directly in the time domain, tailored for very low SNR scenarios, with superior performance over prior deep learning approaches.
Findings
Significant improvement in speech quality at SNRs down to -20dB.
Outperforms SEGAN, PSA-BLSTM, and Wave-U-Net in STOI and PESQ metrics.
Demonstrates robustness and effectiveness of the proposed UNetGAN model.
Abstract
Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to -20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · U-Net · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Dilated Convolution · Convolution
