AeGAN: Time-Frequency Speech Denoising via Generative Adversarial   Networks

Sherif Abdulatif; Karim Armanious; Karim Guirguis; Jayasankar T.; Sajeev; Bin Yang

arXiv:1910.12620·eess.AS·December 29, 2020

AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

Sherif Abdulatif, Karim Armanious, Karim Guirguis, Jayasankar T., Sajeev, Bin Yang

PDF

TL;DR

This paper introduces AeGAN, a GAN-based framework for speech denoising that improves speech quality in noisy environments, enhancing automatic speech recognition and related applications.

Contribution

The paper proposes a novel GAN architecture with CasNet generator and feature-based loss for more realistic speech denoising, outperforming existing methods.

Findings

01

Outperforms traditional speech enhancement techniques.

02

Produces more realistic and phonetics-preserving denoised speech.

03

Demonstrates effectiveness in noisy, real-world environments.

Abstract

Automatic speech recognition (ASR) systems are of vital importance nowadays in commonplace tasks such as speech-to-text processing and language translation. This created the need for an ASR system that can operate in realistic crowded environments. Thus, speech enhancement is a valuable building block in ASR systems and other applications such as hearing aids, smartphones and teleconferencing systems. In this paper, a generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks. A new architecture based on CasNet generator and an additional feature-based loss are incorporated to get realistically denoised speech phonetics. Finally, the proposed framework is shown to outperform other learning and traditional model-based speech enhancement approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.