Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Mingbo Ma; Baigong Zheng; Kaibo Liu; Renjie Zheng; Hairong Liu; Kainan; Peng; Kenneth Church; Liang Huang

arXiv:1911.02750·cs.CL·October 8, 2020·5 cites

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Mingbo Ma, Baigong Zheng, Kaibo Liu, Renjie Zheng, Hairong Liu, Kainan, Peng, Kenneth Church, Liang Huang

PDF

Open Access

TL;DR

This paper introduces a neural incremental text-to-speech method using a prefix-to-prefix framework that significantly reduces latency by synthesizing speech online, enabling real-time applications.

Contribution

It presents the first neural incremental TTS approach based on prefix-to-prefix framework, achieving constant latency for online speech synthesis.

Findings

01

Achieves $O(1)$ latency in speech synthesis.

02

Enables real-time applications like dialogue and translation.

03

Reduces input and computational latency in TTS systems.

Abstract

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness. However, these efforts still suffer from two types of latencies: (a) the {\em computational latency} (synthesizing time), which grows linearly with the sentence length even with parallel approaches, and (b) the {\em input latency} in scenarios where the input text is incrementally generated (such as in simultaneous translation, dialog generation, and assistive technologies). To reduce these latencies, we devise the first neural incremental TTS approach based on the recently proposed prefix-to-prefix framework. We synthesize speech in an online fashion, playing a segment of audio while generating the next, resulting in an $O (1)$ rather than $O (n)$ latency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques