# Expediting TTS Synthesis with Adversarial Vocoding

**Authors:** Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian, McAuley

arXiv: 1904.07944 · 2019-07-29

## TL;DR

This paper introduces a GAN-based method to accelerate TTS vocoding by converting perceptually-informed spectrograms into simpler forms, significantly improving speed and quality over traditional neural vocoders.

## Contribution

It presents a novel GAN-based approach that reduces computational bottlenecks in TTS vocoding and achieves state-of-the-art results in unsupervised speech synthesis.

## Key findings

- Outperforms naive vocoding strategies in user studies
- Runs hundreds of times faster than neural network vocoders
- Achieves state-of-the-art results in unsupervised word synthesis

## Abstract

Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.07944/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1904.07944/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1904.07944/full.md

---
Source: https://tomesphere.com/paper/1904.07944