TL;DR
This paper introduces a CNN-based text-to-speech system that trains significantly faster than RNN-based methods, achieving near-acceptable speech quality within 15 hours on standard hardware.
Contribution
It demonstrates that a fully convolutional neural TTS system can be trained efficiently without recurrent units, reducing training time and computational costs.
Findings
Training completed in 15 hours on a standard gaming PC with GPUs.
Synthesized speech quality was nearly acceptable.
CNN-based TTS offers faster training compared to RNN-based methods.
Abstract
This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units. Recurrent neural networks (RNN) have become a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN components often requires a very powerful computer, or a very long time, typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show that an alternative neural TTS based only on CNN alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS was sufficiently trained overnight (15 hours), using an ordinary gaming PC equipped with two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗dathudeptrai/tts-tacotron2-synpaflex-frmodel· ♡ 1♡ 1
- 🤗tensorspeech/tts-tacotron2-baker-chmodel· ♡ 7♡ 7
- 🤗tensorspeech/tts-tacotron2-kss-komodel· ♡ 5♡ 5
- 🤗tensorspeech/tts-tacotron2-ljspeech-enmodel
- 🤗tensorspeech/tts-tacotron2-synpaflex-frmodel
- 🤗tensorspeech/tts-tacotron2-thorsten-germodel
- 🤗infinisoft/ttsmodel· ♡ 4♡ 4
- 🤗Bilgilice/bilgilice35model
- 🤗praveenchordia/ttsmodel· ♡ 1♡ 1
- 🤗antoniomae1234/voice-xtts2model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · 1-Dimensional Convolutional Neural Networks
