FeatherTTS: Robust and Efficient attention based Neural TTS

Qiao Tian; Zewang Zhang; Chao Liu; Heng Lu; Linghui Chen; Bin Wei,; Pujiang He; Shan Liu

arXiv:2011.00935·eess.AS·November 3, 2020·1 cites

FeatherTTS: Robust and Efficient attention based Neural TTS

Qiao Tian, Zewang Zhang, Chao Liu, Heng Lu, Linghui Chen, Bin Wei,, Pujiang He, Shan Liu

PDF

Open Access

TL;DR

FeatherTTS introduces a robust, efficient neural TTS system using Gaussian attention and block sparsity, achieving high naturalness and 35x faster inference on CPU.

Contribution

It proposes a novel Gaussian attention mechanism and block sparsity to improve robustness and speed in neural TTS, outperforming existing models.

Findings

01

Nearly eliminates word skipping and repetition issues.

02

Speeds up synthesis by 3.5 times over Tacotron.

03

Achieves 35x real-time speed on CPU.

Abstract

Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neural TTS system. Firstly, we propose a novel Gaussian attention which utilizes interpretability of Gaussian attention and the strict monotonic property in TTS. By this method, we replace the commonly used stop token prediction architecture with attentive stop prediction. Secondly, we apply block sparsity on the autoregressive decoder to speed up speech synthesis. The experimental results show that our proposed FeatherTTS not only nearly eliminates the problem of word skipping, repeating in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling