Efficiently Trainable Text-to-Speech System Based on Deep Convolutional   Networks with Guided Attention

Hideyuki Tachibana; Katsuya Uenoyama; Shunsuke Aihara

arXiv:1710.08969·cs.SD·October 1, 2020

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara

PDF

5 Repos 10 Models

TL;DR

This paper introduces a CNN-based text-to-speech system that trains significantly faster than RNN-based methods, achieving near-acceptable speech quality within 15 hours on standard hardware.

Contribution

It demonstrates that a fully convolutional neural TTS system can be trained efficiently without recurrent units, reducing training time and computational costs.

Findings

01

Training completed in 15 hours on a standard gaming PC with GPUs.

02

Synthesized speech quality was nearly acceptable.

03

CNN-based TTS offers faster training compared to RNN-based methods.

Abstract

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units. Recurrent neural networks (RNN) have become a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN components often requires a very powerful computer, or a very long time, typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show that an alternative neural TTS based only on CNN alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS was sufficiently trained overnight (15 hours), using an ordinary gaming PC equipped with two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · 1-Dimensional Convolutional Neural Networks