# Towards Generalized Speech Enhancement with Generative Adversarial   Networks

**Authors:** Santiago Pascual, Joan Serr\`a, Antonio Bonafonte

arXiv: 1904.03418 · 2019-04-09

## TL;DR

This paper introduces a generalized speech enhancement approach using a time-domain GAN that effectively reconstructs speech distorted by various aggressive signal distortions, improving naturalness and speaker identity preservation.

## Contribution

It extends previous GAN-based speech enhancement to handle multiple aggressive distortions with novel loss functions and a two-step training schedule.

## Key findings

- Enhanced speech quality and naturalness in subjective evaluations.
- Better preservation of speaker identity compared to baseline methods.
- Effective reconstruction across multiple distortion types.

## Abstract

The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility. However, little attention is drawn to other, perhaps more aggressive signal distortions like clipping, chunk elimination, or frequency-band removal. Such distortions can have a large impact not only on intelligibility, but also on naturalness or even speaker identity, and require of careful signal reconstruction. In this work, we give full consideration to this generalized speech enhancement task, and show it can be tackled with a time-domain generative adversarial network (GAN). In particular, we extend a previous GAN-based speech enhancement system to deal with mixtures of four types of aggressive distortions. Firstly, we propose the addition of an adversarial acoustic regression loss that promotes a richer feature extraction at the discriminator. Secondly, we also make use of a two-step adversarial training schedule, acting as a warm up-and-fine-tune sequence. Both objective and subjective evaluations show that these two additions bring improved speech reconstructions that better match the original speaker identity and naturalness.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03418/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.03418/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.03418/full.md

---
Source: https://tomesphere.com/paper/1904.03418